DISCOVERING SUBSTRUCTURE IN EXAMPLES

64
1988 CI!.- C -E:"G-88-2223 TED L. .u30 R.:\ TORY of Engineering DISCOVERING SUBSTRUCTURE IN EXAMPLES Lawrence Bruce Holder, Jr. OF ILLINOIS AT URBAN, .. Appr,Y.:ed Pubiic Release, Distribution L'nlinuted.

Transcript of DISCOVERING SUBSTRUCTURE IN EXAMPLES

Page 1: DISCOVERING SUBSTRUCTURE IN EXAMPLES

~ay 1988 CI-C -EG-88-2223

COORDI~-TED SCIE~CE L u30 RTORY CJi~ege of Engineering

DISCOVERING SUBSTRUCTURE IN EXAMPLES

Lawrence Bruce Holder Jr

t~I-ERSITY OF ILLINOIS AT URBAN-CH~rv[PAIG~

ApprYed r~r Pubiic Release Distribution Lnlinuted

------------~------------------~~~~~~~~--~~~--------------------------------REPOAT DOCUMENTATION AGE OI10lT SiCftlTY CiASSPI(A flON

Unc las s te1 ed

it NAME OF IIpoundRfQAMING ORGANIZATlON

Coordinated Science Lab University of Illinois

k AOORESS (Citybull $ fI1III l~ (OOI

1101 W Springfield Avenue Urbana IL 61801

~ME OF FUNDING J SPONSORING ORGArZATlON ~S ~ I (l IOAPA

k AOORESs(CiC) StItbullbull ZI

Jas 0 in 2 t en bull gt bull C ) ~R~ 52Ar un ~ton A ____ _

Arli~zton VA 2~~OQ_23ng

6111 OFFICI SYMIOI (I Iblf

NIl

111 OFI(( SYMIOL (If 1ItJIi(10III1

~isc~ver~ Substructure i~ ~xaMles

Il IiIUISONAI AIJHOItI~older Jr bull Lawrence 3ruce

13 rtPE JF RlPORT achnial t11b rIME CD~EREO

-ROM 0

0 ill$Tll(1I 4AftI(INGS

~one

l OISTIUTCNIA~UTY 0 velr Approved for public releasej distribution un11~1ted

7bull ~AMI OF MONITORING OAGMILtTlON

~S~ 0~R and DARPA

7b AOO155 (City Stiff rwI llli (OtIfII ~shingtgn Dr 1055~

Aclin~ton ~A ~202

Arlin~ton ~A 22n9-23QR t fItOCUQMINT NSTftUMINT IDiNTlFlCA1ION NUMIER ~SF IST-85-11170 ~OOOL4-32-K-01R6 ~o0014-~i-K-0874 -

TASK NO

COSAn coon bull SU8JICT rEitMS CormllCl4t Ott l1~ru 1 usury WI OItrtI~ br 0I0dt MJlftOfrJ

lachine learning substJC cre ~ iscover S3DtE

i5 htslS dllCflb-S a methoci for discoering substructure concepts in eX3mJ)les 7hl method inoes a computa llonail constrained bn-orst search Iuuied by four heuristiCS coglitle ~alnt~ compactness connectivity and covenge Elc heuristic slescrbed n detail aion~ Ith ts rOie in tllultln ln mciciltal SU-stTJCIUre concept The SlBDLl s-slm ~Ill ImJlemencs the method contains l substructure discoe-y module l substructure specialiZ3tlon module nl i Incemer-tal substructure aCKiround knolecige module for applYlni jleously discoered substructure oncepl 71e sub~tructue back round lo~led~e Includes both user-delineci ~nd SSJlE-Jiscovertd sub~lruCturtS n a hierarch ampt s used to jetene bC1 sulstructures are present In the Input examplelt 7he ~ynem has periormed Qell on nllnoer oi ~iamC5 r)m dllfe~ent domainS and has discoerecl man rtetlt5Ilnpound ~ubstructure concepls such 1$ a )omlc 1~t )11 Taco-operllor for ltacillng blocks he Tc-od and mpltTentltion oi tle StBDLE systen u d~~C=IC(d lci In lUl~lI$ oi txpe-tmentll result s presented

20 1ali ilOIli AVAlioAIIUTY OF A8STRACT l1 ISTIACT S~JIUTY CuSIFI(ATlON tl N(USIFIEO~NIMITEO a 5AM( AS ~T o OrC USEitS tncass ~aa

jl APlIIGlllon TIl oe uS44 ojMII tampllwIt All otr edltlO re OOl4let bull

OISCOVERIG StBSTRCCTLRE 1 EX--lPLES

8Y

L- WRECE 8RLCE HOLDER JR

8S Lmiddotrive-sity of Illinois at Lrbana-Cbampaign 1986

THESIS

Submitted in partlll fulfillment of tne requIrements fvr tne degree of -laster of Sciene in Computer Science

in the Graduau Cgtlege of the Lnwersity of Illinois at Lrbana-Champaign 1988

Lrbana IllinOIS

Lawrence Bruce Holder Jr 1S Department of Computer Science

tniversity of Illinois at trbana-Champaign 1988 Robert Earl Stepp m Advisor

T~llS thesis describes a method for disltovering substructure oncepts in uamples The method

Involves a computationally constrained best-first search guieed by four heuristics cognitive

savings compactness connectivity and coverage Each heuristic s described in detail along with its

role In ealuatng an individual substructure concept The SlBOlE system that implements the

method contains a substructure discovery module a substructure specialization module and an

mcemenal substructure background knowledge module for applying previously discovered

substructure concepts The substructure background knowledge includes both user-defined and

SLBDLE-discovered substructures in a hierarchy that is used to determine which substructures art

present in the input eumples The system bas performed well on a number of eumples from

dderent domains and has discovered many interesting substructure conceU sucb as an aromatic

rrg and a macro-operator for stacking blocks The nethod and implementation of the SlBDLE

sstem are described and an analysis of experimental results is resented

iii

IJould like 0 thank my advisor Robert Stepp for ~S uoerous onrbutors tJ te etas

within this thesIs His talent for Insigbtful thinking and patient communicatlon has nade c

work both enjoyable and rewarding

I would also like to thanJc Brad Whiteball Bob Reinke Bharat Rao Diane COOk and Bob 1041

for their comments and suggestions regarding this work Tlanks also to the rest of the Artficai

Intelligence Researlh Group at the Coordinated Science Laboratory especially to my office mate

~ott Benr~ett for his patient participation in our discussions Special thanks to our s~e3r

Fancle Bridges for her ever jleasant assistance

Finally I would like to thank my family for their enthusiastic support and encouragement

brougcout my academic endeavors

This researCl was partIally supported by the ationa1 Science Foundation under grant SF

IST-85-11170 the Offe of ~aval Researcll under grant ~OOO14-82-Kmiddot0186 by Defense Advanced

Research ProJ~ts Agency under grant ~OOOl$-ampi-K-Oamp7$ and by a gIft from Texas rnstrumenu

Inc

T vlLE OF CO-TETS

CHAPTER

1 ITRODlCTIO

2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon

I

23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln

3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25

321 GeneratIon 25 322 Selection 29 323 ExamFle 32

33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37

34 Substructure Background Knowledge 39

3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13

EXPERI(E-n -16

41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59

c REL~TED VORK 62 51 Gestalt Pshoiogy 62

52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69

CH~TER

6 COCL tSIO

6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~

611 Substucture DiSCOvery and InstantIation 6

622 Background Knowledge 75 623 ApplicatIOns 79

REFERECES 51

APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93

~ 3 Experllcent 3 99

~ Etperiment J 102 AS Experiment 5 104

vi

QPTER 1

LTRODtCI10N

At any Iven moment the amount of detailed information available fr~m an envlronmel s

overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d

rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and

mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard

ettract infermation from the environment at a level of detail that is appropriate for he purpose cf

tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert

more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a

com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a

strucured enVrgtnment

Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg

nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f

hese stl1cural primitlves From the observations at different levels of detail humans na

construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be

descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts

Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he

bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~

form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel

of detail for represen~ing ~he environment

Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels

nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl

1

nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e

1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~

s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e

suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J

definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili

erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he

aforementlcned cmponents

Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained

In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a

substructure specaliZltion module and a substructure background know1edge module After a

substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized

substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the

substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve

substructures tpon receiving a new set of input examples the background know~dgt rnodJe

determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE

system runs a hie3rcllical representation of selected structures found in he env ironnent ~s

constructed within tlle background knowledge modJle

Chapter presents several experiments witll the StBDLE systec EXFements wall tne

s1bstructure discovery module indicate that the suostucture generation and substructure seiecton

processes triorm vell in guiding the search towards more interestln~ substructures Other

experiments incorporating botll the substructure background keowledge module lnd spetallutlon

module demonstate tbe performance improvement obtaned by using revlousiy earned

sulstructures In subsequent discovery tASks The input and output data for each exerlent are

isted in ApperdlI A

Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i

3

CHAPTER 2

stBSTRUCTtJRE DISCOVERY

Te major focus of t1is work IS to investigate methods for discoveing substructure concepts

in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni

and repetitive -hunks- of structure within the individual elements of a set of structured examples

Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified

by replacing all occurences of ~he substructure ith a single form that represents the

substructure The simplified descriptions may then be passed to other learning syst~ms In

lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or

to build a hierarchical interpreuticn of the uamples in terms of hell subparts

ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture

discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance

of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt

along with the components comprising a substructure [etods for geneatlng alte-1ative

substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among

these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the

substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on

2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to

describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre

s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre

dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg

the previously described t~hni~ues

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 2: DISCOVERING SUBSTRUCTURE IN EXAMPLES

------------~------------------~~~~~~~~--~~~--------------------------------REPOAT DOCUMENTATION AGE OI10lT SiCftlTY CiASSPI(A flON

Unc las s te1 ed

it NAME OF IIpoundRfQAMING ORGANIZATlON

Coordinated Science Lab University of Illinois

k AOORESS (Citybull $ fI1III l~ (OOI

1101 W Springfield Avenue Urbana IL 61801

~ME OF FUNDING J SPONSORING ORGArZATlON ~S ~ I (l IOAPA

k AOORESs(CiC) StItbullbull ZI

Jas 0 in 2 t en bull gt bull C ) ~R~ 52Ar un ~ton A ____ _

Arli~zton VA 2~~OQ_23ng

6111 OFFICI SYMIOI (I Iblf

NIl

111 OFI(( SYMIOL (If 1ItJIi(10III1

~isc~ver~ Substructure i~ ~xaMles

Il IiIUISONAI AIJHOItI~older Jr bull Lawrence 3ruce

13 rtPE JF RlPORT achnial t11b rIME CD~EREO

-ROM 0

0 ill$Tll(1I 4AftI(INGS

~one

l OISTIUTCNIA~UTY 0 velr Approved for public releasej distribution un11~1ted

7bull ~AMI OF MONITORING OAGMILtTlON

~S~ 0~R and DARPA

7b AOO155 (City Stiff rwI llli (OtIfII ~shingtgn Dr 1055~

Aclin~ton ~A ~202

Arlin~ton ~A 22n9-23QR t fItOCUQMINT NSTftUMINT IDiNTlFlCA1ION NUMIER ~SF IST-85-11170 ~OOOL4-32-K-01R6 ~o0014-~i-K-0874 -

TASK NO

COSAn coon bull SU8JICT rEitMS CormllCl4t Ott l1~ru 1 usury WI OItrtI~ br 0I0dt MJlftOfrJ

lachine learning substJC cre ~ iscover S3DtE

i5 htslS dllCflb-S a methoci for discoering substructure concepts in eX3mJ)les 7hl method inoes a computa llonail constrained bn-orst search Iuuied by four heuristiCS coglitle ~alnt~ compactness connectivity and covenge Elc heuristic slescrbed n detail aion~ Ith ts rOie in tllultln ln mciciltal SU-stTJCIUre concept The SlBDLl s-slm ~Ill ImJlemencs the method contains l substructure discoe-y module l substructure specialiZ3tlon module nl i Incemer-tal substructure aCKiround knolecige module for applYlni jleously discoered substructure oncepl 71e sub~tructue back round lo~led~e Includes both user-delineci ~nd SSJlE-Jiscovertd sub~lruCturtS n a hierarch ampt s used to jetene bC1 sulstructures are present In the Input examplelt 7he ~ynem has periormed Qell on nllnoer oi ~iamC5 r)m dllfe~ent domainS and has discoerecl man rtetlt5Ilnpound ~ubstructure concepls such 1$ a )omlc 1~t )11 Taco-operllor for ltacillng blocks he Tc-od and mpltTentltion oi tle StBDLE systen u d~~C=IC(d lci In lUl~lI$ oi txpe-tmentll result s presented

20 1ali ilOIli AVAlioAIIUTY OF A8STRACT l1 ISTIACT S~JIUTY CuSIFI(ATlON tl N(USIFIEO~NIMITEO a 5AM( AS ~T o OrC USEitS tncass ~aa

jl APlIIGlllon TIl oe uS44 ojMII tampllwIt All otr edltlO re OOl4let bull

OISCOVERIG StBSTRCCTLRE 1 EX--lPLES

8Y

L- WRECE 8RLCE HOLDER JR

8S Lmiddotrive-sity of Illinois at Lrbana-Cbampaign 1986

THESIS

Submitted in partlll fulfillment of tne requIrements fvr tne degree of -laster of Sciene in Computer Science

in the Graduau Cgtlege of the Lnwersity of Illinois at Lrbana-Champaign 1988

Lrbana IllinOIS

Lawrence Bruce Holder Jr 1S Department of Computer Science

tniversity of Illinois at trbana-Champaign 1988 Robert Earl Stepp m Advisor

T~llS thesis describes a method for disltovering substructure oncepts in uamples The method

Involves a computationally constrained best-first search guieed by four heuristics cognitive

savings compactness connectivity and coverage Each heuristic s described in detail along with its

role In ealuatng an individual substructure concept The SlBOlE system that implements the

method contains a substructure discovery module a substructure specialization module and an

mcemenal substructure background knowledge module for applying previously discovered

substructure concepts The substructure background knowledge includes both user-defined and

SLBDLE-discovered substructures in a hierarchy that is used to determine which substructures art

present in the input eumples The system bas performed well on a number of eumples from

dderent domains and has discovered many interesting substructure conceU sucb as an aromatic

rrg and a macro-operator for stacking blocks The nethod and implementation of the SlBDLE

sstem are described and an analysis of experimental results is resented

iii

IJould like 0 thank my advisor Robert Stepp for ~S uoerous onrbutors tJ te etas

within this thesIs His talent for Insigbtful thinking and patient communicatlon has nade c

work both enjoyable and rewarding

I would also like to thanJc Brad Whiteball Bob Reinke Bharat Rao Diane COOk and Bob 1041

for their comments and suggestions regarding this work Tlanks also to the rest of the Artficai

Intelligence Researlh Group at the Coordinated Science Laboratory especially to my office mate

~ott Benr~ett for his patient participation in our discussions Special thanks to our s~e3r

Fancle Bridges for her ever jleasant assistance

Finally I would like to thank my family for their enthusiastic support and encouragement

brougcout my academic endeavors

This researCl was partIally supported by the ationa1 Science Foundation under grant SF

IST-85-11170 the Offe of ~aval Researcll under grant ~OOO14-82-Kmiddot0186 by Defense Advanced

Research ProJ~ts Agency under grant ~OOOl$-ampi-K-Oamp7$ and by a gIft from Texas rnstrumenu

Inc

T vlLE OF CO-TETS

CHAPTER

1 ITRODlCTIO

2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon

I

23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln

3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25

321 GeneratIon 25 322 Selection 29 323 ExamFle 32

33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37

34 Substructure Background Knowledge 39

3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13

EXPERI(E-n -16

41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59

c REL~TED VORK 62 51 Gestalt Pshoiogy 62

52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69

CH~TER

6 COCL tSIO

6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~

611 Substucture DiSCOvery and InstantIation 6

622 Background Knowledge 75 623 ApplicatIOns 79

REFERECES 51

APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93

~ 3 Experllcent 3 99

~ Etperiment J 102 AS Experiment 5 104

vi

QPTER 1

LTRODtCI10N

At any Iven moment the amount of detailed information available fr~m an envlronmel s

overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d

rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and

mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard

ettract infermation from the environment at a level of detail that is appropriate for he purpose cf

tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert

more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a

com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a

strucured enVrgtnment

Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg

nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f

hese stl1cural primitlves From the observations at different levels of detail humans na

construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be

descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts

Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he

bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~

form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel

of detail for represen~ing ~he environment

Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels

nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl

1

nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e

1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~

s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e

suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J

definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili

erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he

aforementlcned cmponents

Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained

In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a

substructure specaliZltion module and a substructure background know1edge module After a

substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized

substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the

substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve

substructures tpon receiving a new set of input examples the background know~dgt rnodJe

determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE

system runs a hie3rcllical representation of selected structures found in he env ironnent ~s

constructed within tlle background knowledge modJle

Chapter presents several experiments witll the StBDLE systec EXFements wall tne

s1bstructure discovery module indicate that the suostucture generation and substructure seiecton

processes triorm vell in guiding the search towards more interestln~ substructures Other

experiments incorporating botll the substructure background keowledge module lnd spetallutlon

module demonstate tbe performance improvement obtaned by using revlousiy earned

sulstructures In subsequent discovery tASks The input and output data for each exerlent are

isted in ApperdlI A

Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i

3

CHAPTER 2

stBSTRUCTtJRE DISCOVERY

Te major focus of t1is work IS to investigate methods for discoveing substructure concepts

in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni

and repetitive -hunks- of structure within the individual elements of a set of structured examples

Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified

by replacing all occurences of ~he substructure ith a single form that represents the

substructure The simplified descriptions may then be passed to other learning syst~ms In

lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or

to build a hierarchical interpreuticn of the uamples in terms of hell subparts

ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture

discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance

of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt

along with the components comprising a substructure [etods for geneatlng alte-1ative

substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among

these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the

substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on

2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to

describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre

s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre

dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg

the previously described t~hni~ues

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 3: DISCOVERING SUBSTRUCTURE IN EXAMPLES

OISCOVERIG StBSTRCCTLRE 1 EX--lPLES

8Y

L- WRECE 8RLCE HOLDER JR

8S Lmiddotrive-sity of Illinois at Lrbana-Cbampaign 1986

THESIS

Submitted in partlll fulfillment of tne requIrements fvr tne degree of -laster of Sciene in Computer Science

in the Graduau Cgtlege of the Lnwersity of Illinois at Lrbana-Champaign 1988

Lrbana IllinOIS

Lawrence Bruce Holder Jr 1S Department of Computer Science

tniversity of Illinois at trbana-Champaign 1988 Robert Earl Stepp m Advisor

T~llS thesis describes a method for disltovering substructure oncepts in uamples The method

Involves a computationally constrained best-first search guieed by four heuristics cognitive

savings compactness connectivity and coverage Each heuristic s described in detail along with its

role In ealuatng an individual substructure concept The SlBOlE system that implements the

method contains a substructure discovery module a substructure specialization module and an

mcemenal substructure background knowledge module for applying previously discovered

substructure concepts The substructure background knowledge includes both user-defined and

SLBDLE-discovered substructures in a hierarchy that is used to determine which substructures art

present in the input eumples The system bas performed well on a number of eumples from

dderent domains and has discovered many interesting substructure conceU sucb as an aromatic

rrg and a macro-operator for stacking blocks The nethod and implementation of the SlBDLE

sstem are described and an analysis of experimental results is resented

iii

IJould like 0 thank my advisor Robert Stepp for ~S uoerous onrbutors tJ te etas

within this thesIs His talent for Insigbtful thinking and patient communicatlon has nade c

work both enjoyable and rewarding

I would also like to thanJc Brad Whiteball Bob Reinke Bharat Rao Diane COOk and Bob 1041

for their comments and suggestions regarding this work Tlanks also to the rest of the Artficai

Intelligence Researlh Group at the Coordinated Science Laboratory especially to my office mate

~ott Benr~ett for his patient participation in our discussions Special thanks to our s~e3r

Fancle Bridges for her ever jleasant assistance

Finally I would like to thank my family for their enthusiastic support and encouragement

brougcout my academic endeavors

This researCl was partIally supported by the ationa1 Science Foundation under grant SF

IST-85-11170 the Offe of ~aval Researcll under grant ~OOO14-82-Kmiddot0186 by Defense Advanced

Research ProJ~ts Agency under grant ~OOOl$-ampi-K-Oamp7$ and by a gIft from Texas rnstrumenu

Inc

T vlLE OF CO-TETS

CHAPTER

1 ITRODlCTIO

2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon

I

23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln

3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25

321 GeneratIon 25 322 Selection 29 323 ExamFle 32

33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37

34 Substructure Background Knowledge 39

3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13

EXPERI(E-n -16

41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59

c REL~TED VORK 62 51 Gestalt Pshoiogy 62

52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69

CH~TER

6 COCL tSIO

6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~

611 Substucture DiSCOvery and InstantIation 6

622 Background Knowledge 75 623 ApplicatIOns 79

REFERECES 51

APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93

~ 3 Experllcent 3 99

~ Etperiment J 102 AS Experiment 5 104

vi

QPTER 1

LTRODtCI10N

At any Iven moment the amount of detailed information available fr~m an envlronmel s

overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d

rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and

mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard

ettract infermation from the environment at a level of detail that is appropriate for he purpose cf

tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert

more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a

com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a

strucured enVrgtnment

Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg

nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f

hese stl1cural primitlves From the observations at different levels of detail humans na

construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be

descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts

Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he

bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~

form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel

of detail for represen~ing ~he environment

Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels

nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl

1

nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e

1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~

s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e

suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J

definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili

erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he

aforementlcned cmponents

Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained

In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a

substructure specaliZltion module and a substructure background know1edge module After a

substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized

substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the

substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve

substructures tpon receiving a new set of input examples the background know~dgt rnodJe

determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE

system runs a hie3rcllical representation of selected structures found in he env ironnent ~s

constructed within tlle background knowledge modJle

Chapter presents several experiments witll the StBDLE systec EXFements wall tne

s1bstructure discovery module indicate that the suostucture generation and substructure seiecton

processes triorm vell in guiding the search towards more interestln~ substructures Other

experiments incorporating botll the substructure background keowledge module lnd spetallutlon

module demonstate tbe performance improvement obtaned by using revlousiy earned

sulstructures In subsequent discovery tASks The input and output data for each exerlent are

isted in ApperdlI A

Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i

3

CHAPTER 2

stBSTRUCTtJRE DISCOVERY

Te major focus of t1is work IS to investigate methods for discoveing substructure concepts

in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni

and repetitive -hunks- of structure within the individual elements of a set of structured examples

Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified

by replacing all occurences of ~he substructure ith a single form that represents the

substructure The simplified descriptions may then be passed to other learning syst~ms In

lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or

to build a hierarchical interpreuticn of the uamples in terms of hell subparts

ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture

discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance

of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt

along with the components comprising a substructure [etods for geneatlng alte-1ative

substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among

these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the

substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on

2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to

describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre

s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre

dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg

the previously described t~hni~ues

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 4: DISCOVERING SUBSTRUCTURE IN EXAMPLES

Lawrence Bruce Holder Jr 1S Department of Computer Science

tniversity of Illinois at trbana-Champaign 1988 Robert Earl Stepp m Advisor

T~llS thesis describes a method for disltovering substructure oncepts in uamples The method

Involves a computationally constrained best-first search guieed by four heuristics cognitive

savings compactness connectivity and coverage Each heuristic s described in detail along with its

role In ealuatng an individual substructure concept The SlBOlE system that implements the

method contains a substructure discovery module a substructure specialization module and an

mcemenal substructure background knowledge module for applying previously discovered

substructure concepts The substructure background knowledge includes both user-defined and

SLBDLE-discovered substructures in a hierarchy that is used to determine which substructures art

present in the input eumples The system bas performed well on a number of eumples from

dderent domains and has discovered many interesting substructure conceU sucb as an aromatic

rrg and a macro-operator for stacking blocks The nethod and implementation of the SlBDLE

sstem are described and an analysis of experimental results is resented

iii

IJould like 0 thank my advisor Robert Stepp for ~S uoerous onrbutors tJ te etas

within this thesIs His talent for Insigbtful thinking and patient communicatlon has nade c

work both enjoyable and rewarding

I would also like to thanJc Brad Whiteball Bob Reinke Bharat Rao Diane COOk and Bob 1041

for their comments and suggestions regarding this work Tlanks also to the rest of the Artficai

Intelligence Researlh Group at the Coordinated Science Laboratory especially to my office mate

~ott Benr~ett for his patient participation in our discussions Special thanks to our s~e3r

Fancle Bridges for her ever jleasant assistance

Finally I would like to thank my family for their enthusiastic support and encouragement

brougcout my academic endeavors

This researCl was partIally supported by the ationa1 Science Foundation under grant SF

IST-85-11170 the Offe of ~aval Researcll under grant ~OOO14-82-Kmiddot0186 by Defense Advanced

Research ProJ~ts Agency under grant ~OOOl$-ampi-K-Oamp7$ and by a gIft from Texas rnstrumenu

Inc

T vlLE OF CO-TETS

CHAPTER

1 ITRODlCTIO

2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon

I

23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln

3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25

321 GeneratIon 25 322 Selection 29 323 ExamFle 32

33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37

34 Substructure Background Knowledge 39

3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13

EXPERI(E-n -16

41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59

c REL~TED VORK 62 51 Gestalt Pshoiogy 62

52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69

CH~TER

6 COCL tSIO

6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~

611 Substucture DiSCOvery and InstantIation 6

622 Background Knowledge 75 623 ApplicatIOns 79

REFERECES 51

APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93

~ 3 Experllcent 3 99

~ Etperiment J 102 AS Experiment 5 104

vi

QPTER 1

LTRODtCI10N

At any Iven moment the amount of detailed information available fr~m an envlronmel s

overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d

rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and

mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard

ettract infermation from the environment at a level of detail that is appropriate for he purpose cf

tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert

more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a

com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a

strucured enVrgtnment

Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg

nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f

hese stl1cural primitlves From the observations at different levels of detail humans na

construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be

descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts

Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he

bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~

form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel

of detail for represen~ing ~he environment

Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels

nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl

1

nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e

1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~

s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e

suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J

definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili

erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he

aforementlcned cmponents

Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained

In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a

substructure specaliZltion module and a substructure background know1edge module After a

substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized

substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the

substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve

substructures tpon receiving a new set of input examples the background know~dgt rnodJe

determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE

system runs a hie3rcllical representation of selected structures found in he env ironnent ~s

constructed within tlle background knowledge modJle

Chapter presents several experiments witll the StBDLE systec EXFements wall tne

s1bstructure discovery module indicate that the suostucture generation and substructure seiecton

processes triorm vell in guiding the search towards more interestln~ substructures Other

experiments incorporating botll the substructure background keowledge module lnd spetallutlon

module demonstate tbe performance improvement obtaned by using revlousiy earned

sulstructures In subsequent discovery tASks The input and output data for each exerlent are

isted in ApperdlI A

Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i

3

CHAPTER 2

stBSTRUCTtJRE DISCOVERY

Te major focus of t1is work IS to investigate methods for discoveing substructure concepts

in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni

and repetitive -hunks- of structure within the individual elements of a set of structured examples

Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified

by replacing all occurences of ~he substructure ith a single form that represents the

substructure The simplified descriptions may then be passed to other learning syst~ms In

lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or

to build a hierarchical interpreuticn of the uamples in terms of hell subparts

ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture

discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance

of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt

along with the components comprising a substructure [etods for geneatlng alte-1ative

substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among

these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the

substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on

2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to

describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre

s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre

dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg

the previously described t~hni~ues

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 5: DISCOVERING SUBSTRUCTURE IN EXAMPLES

IJould like 0 thank my advisor Robert Stepp for ~S uoerous onrbutors tJ te etas

within this thesIs His talent for Insigbtful thinking and patient communicatlon has nade c

work both enjoyable and rewarding

I would also like to thanJc Brad Whiteball Bob Reinke Bharat Rao Diane COOk and Bob 1041

for their comments and suggestions regarding this work Tlanks also to the rest of the Artficai

Intelligence Researlh Group at the Coordinated Science Laboratory especially to my office mate

~ott Benr~ett for his patient participation in our discussions Special thanks to our s~e3r

Fancle Bridges for her ever jleasant assistance

Finally I would like to thank my family for their enthusiastic support and encouragement

brougcout my academic endeavors

This researCl was partIally supported by the ationa1 Science Foundation under grant SF

IST-85-11170 the Offe of ~aval Researcll under grant ~OOO14-82-Kmiddot0186 by Defense Advanced

Research ProJ~ts Agency under grant ~OOOl$-ampi-K-Oamp7$ and by a gIft from Texas rnstrumenu

Inc

T vlLE OF CO-TETS

CHAPTER

1 ITRODlCTIO

2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon

I

23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln

3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25

321 GeneratIon 25 322 Selection 29 323 ExamFle 32

33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37

34 Substructure Background Knowledge 39

3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13

EXPERI(E-n -16

41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59

c REL~TED VORK 62 51 Gestalt Pshoiogy 62

52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69

CH~TER

6 COCL tSIO

6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~

611 Substucture DiSCOvery and InstantIation 6

622 Background Knowledge 75 623 ApplicatIOns 79

REFERECES 51

APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93

~ 3 Experllcent 3 99

~ Etperiment J 102 AS Experiment 5 104

vi

QPTER 1

LTRODtCI10N

At any Iven moment the amount of detailed information available fr~m an envlronmel s

overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d

rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and

mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard

ettract infermation from the environment at a level of detail that is appropriate for he purpose cf

tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert

more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a

com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a

strucured enVrgtnment

Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg

nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f

hese stl1cural primitlves From the observations at different levels of detail humans na

construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be

descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts

Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he

bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~

form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel

of detail for represen~ing ~he environment

Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels

nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl

1

nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e

1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~

s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e

suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J

definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili

erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he

aforementlcned cmponents

Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained

In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a

substructure specaliZltion module and a substructure background know1edge module After a

substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized

substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the

substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve

substructures tpon receiving a new set of input examples the background know~dgt rnodJe

determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE

system runs a hie3rcllical representation of selected structures found in he env ironnent ~s

constructed within tlle background knowledge modJle

Chapter presents several experiments witll the StBDLE systec EXFements wall tne

s1bstructure discovery module indicate that the suostucture generation and substructure seiecton

processes triorm vell in guiding the search towards more interestln~ substructures Other

experiments incorporating botll the substructure background keowledge module lnd spetallutlon

module demonstate tbe performance improvement obtaned by using revlousiy earned

sulstructures In subsequent discovery tASks The input and output data for each exerlent are

isted in ApperdlI A

Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i

3

CHAPTER 2

stBSTRUCTtJRE DISCOVERY

Te major focus of t1is work IS to investigate methods for discoveing substructure concepts

in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni

and repetitive -hunks- of structure within the individual elements of a set of structured examples

Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified

by replacing all occurences of ~he substructure ith a single form that represents the

substructure The simplified descriptions may then be passed to other learning syst~ms In

lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or

to build a hierarchical interpreuticn of the uamples in terms of hell subparts

ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture

discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance

of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt

along with the components comprising a substructure [etods for geneatlng alte-1ative

substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among

these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the

substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on

2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to

describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre

s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre

dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg

the previously described t~hni~ues

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 6: DISCOVERING SUBSTRUCTURE IN EXAMPLES

T vlLE OF CO-TETS

CHAPTER

1 ITRODlCTIO

2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon

I

23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln

3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25

321 GeneratIon 25 322 Selection 29 323 ExamFle 32

33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37

34 Substructure Background Knowledge 39

3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13

EXPERI(E-n -16

41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59

c REL~TED VORK 62 51 Gestalt Pshoiogy 62

52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69

CH~TER

6 COCL tSIO

6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~

611 Substucture DiSCOvery and InstantIation 6

622 Background Knowledge 75 623 ApplicatIOns 79

REFERECES 51

APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93

~ 3 Experllcent 3 99

~ Etperiment J 102 AS Experiment 5 104

vi

QPTER 1

LTRODtCI10N

At any Iven moment the amount of detailed information available fr~m an envlronmel s

overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d

rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and

mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard

ettract infermation from the environment at a level of detail that is appropriate for he purpose cf

tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert

more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a

com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a

strucured enVrgtnment

Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg

nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f

hese stl1cural primitlves From the observations at different levels of detail humans na

construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be

descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts

Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he

bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~

form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel

of detail for represen~ing ~he environment

Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels

nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl

1

nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e

1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~

s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e

suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J

definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili

erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he

aforementlcned cmponents

Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained

In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a

substructure specaliZltion module and a substructure background know1edge module After a

substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized

substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the

substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve

substructures tpon receiving a new set of input examples the background know~dgt rnodJe

determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE

system runs a hie3rcllical representation of selected structures found in he env ironnent ~s

constructed within tlle background knowledge modJle

Chapter presents several experiments witll the StBDLE systec EXFements wall tne

s1bstructure discovery module indicate that the suostucture generation and substructure seiecton

processes triorm vell in guiding the search towards more interestln~ substructures Other

experiments incorporating botll the substructure background keowledge module lnd spetallutlon

module demonstate tbe performance improvement obtaned by using revlousiy earned

sulstructures In subsequent discovery tASks The input and output data for each exerlent are

isted in ApperdlI A

Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i

3

CHAPTER 2

stBSTRUCTtJRE DISCOVERY

Te major focus of t1is work IS to investigate methods for discoveing substructure concepts

in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni

and repetitive -hunks- of structure within the individual elements of a set of structured examples

Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified

by replacing all occurences of ~he substructure ith a single form that represents the

substructure The simplified descriptions may then be passed to other learning syst~ms In

lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or

to build a hierarchical interpreuticn of the uamples in terms of hell subparts

ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture

discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance

of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt

along with the components comprising a substructure [etods for geneatlng alte-1ative

substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among

these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the

substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on

2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to

describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre

s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre

dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg

the previously described t~hni~ues

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 7: DISCOVERING SUBSTRUCTURE IN EXAMPLES

CH~TER

6 COCL tSIO

6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~

611 Substucture DiSCOvery and InstantIation 6

622 Background Knowledge 75 623 ApplicatIOns 79

REFERECES 51

APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93

~ 3 Experllcent 3 99

~ Etperiment J 102 AS Experiment 5 104

vi

QPTER 1

LTRODtCI10N

At any Iven moment the amount of detailed information available fr~m an envlronmel s

overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d

rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and

mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard

ettract infermation from the environment at a level of detail that is appropriate for he purpose cf

tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert

more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a

com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a

strucured enVrgtnment

Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg

nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f

hese stl1cural primitlves From the observations at different levels of detail humans na

construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be

descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts

Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he

bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~

form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel

of detail for represen~ing ~he environment

Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels

nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl

1

nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e

1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~

s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e

suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J

definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili

erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he

aforementlcned cmponents

Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained

In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a

substructure specaliZltion module and a substructure background know1edge module After a

substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized

substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the

substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve

substructures tpon receiving a new set of input examples the background know~dgt rnodJe

determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE

system runs a hie3rcllical representation of selected structures found in he env ironnent ~s

constructed within tlle background knowledge modJle

Chapter presents several experiments witll the StBDLE systec EXFements wall tne

s1bstructure discovery module indicate that the suostucture generation and substructure seiecton

processes triorm vell in guiding the search towards more interestln~ substructures Other

experiments incorporating botll the substructure background keowledge module lnd spetallutlon

module demonstate tbe performance improvement obtaned by using revlousiy earned

sulstructures In subsequent discovery tASks The input and output data for each exerlent are

isted in ApperdlI A

Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i

3

CHAPTER 2

stBSTRUCTtJRE DISCOVERY

Te major focus of t1is work IS to investigate methods for discoveing substructure concepts

in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni

and repetitive -hunks- of structure within the individual elements of a set of structured examples

Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified

by replacing all occurences of ~he substructure ith a single form that represents the

substructure The simplified descriptions may then be passed to other learning syst~ms In

lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or

to build a hierarchical interpreuticn of the uamples in terms of hell subparts

ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture

discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance

of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt

along with the components comprising a substructure [etods for geneatlng alte-1ative

substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among

these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the

substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on

2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to

describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre

s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre

dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg

the previously described t~hni~ues

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 8: DISCOVERING SUBSTRUCTURE IN EXAMPLES

QPTER 1

LTRODtCI10N

At any Iven moment the amount of detailed information available fr~m an envlronmel s

overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d

rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and

mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard

ettract infermation from the environment at a level of detail that is appropriate for he purpose cf

tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert

more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a

com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a

strucured enVrgtnment

Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg

nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f

hese stl1cural primitlves From the observations at different levels of detail humans na

construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be

descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts

Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he

bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~

form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel

of detail for represen~ing ~he environment

Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels

nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl

1

nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e

1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~

s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e

suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J

definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili

erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he

aforementlcned cmponents

Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained

In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a

substructure specaliZltion module and a substructure background know1edge module After a

substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized

substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the

substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve

substructures tpon receiving a new set of input examples the background know~dgt rnodJe

determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE

system runs a hie3rcllical representation of selected structures found in he env ironnent ~s

constructed within tlle background knowledge modJle

Chapter presents several experiments witll the StBDLE systec EXFements wall tne

s1bstructure discovery module indicate that the suostucture generation and substructure seiecton

processes triorm vell in guiding the search towards more interestln~ substructures Other

experiments incorporating botll the substructure background keowledge module lnd spetallutlon

module demonstate tbe performance improvement obtaned by using revlousiy earned

sulstructures In subsequent discovery tASks The input and output data for each exerlent are

isted in ApperdlI A

Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i

3

CHAPTER 2

stBSTRUCTtJRE DISCOVERY

Te major focus of t1is work IS to investigate methods for discoveing substructure concepts

in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni

and repetitive -hunks- of structure within the individual elements of a set of structured examples

Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified

by replacing all occurences of ~he substructure ith a single form that represents the

substructure The simplified descriptions may then be passed to other learning syst~ms In

lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or

to build a hierarchical interpreuticn of the uamples in terms of hell subparts

ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture

discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance

of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt

along with the components comprising a substructure [etods for geneatlng alte-1ative

substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among

these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the

substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on

2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to

describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre

s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre

dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg

the previously described t~hni~ues

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 9: DISCOVERING SUBSTRUCTURE IN EXAMPLES

nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e

1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~

s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e

suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J

definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili

erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he

aforementlcned cmponents

Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained

In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a

substructure specaliZltion module and a substructure background know1edge module After a

substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized

substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the

substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve

substructures tpon receiving a new set of input examples the background know~dgt rnodJe

determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE

system runs a hie3rcllical representation of selected structures found in he env ironnent ~s

constructed within tlle background knowledge modJle

Chapter presents several experiments witll the StBDLE systec EXFements wall tne

s1bstructure discovery module indicate that the suostucture generation and substructure seiecton

processes triorm vell in guiding the search towards more interestln~ substructures Other

experiments incorporating botll the substructure background keowledge module lnd spetallutlon

module demonstate tbe performance improvement obtaned by using revlousiy earned

sulstructures In subsequent discovery tASks The input and output data for each exerlent are

isted in ApperdlI A

Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i

3

CHAPTER 2

stBSTRUCTtJRE DISCOVERY

Te major focus of t1is work IS to investigate methods for discoveing substructure concepts

in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni

and repetitive -hunks- of structure within the individual elements of a set of structured examples

Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified

by replacing all occurences of ~he substructure ith a single form that represents the

substructure The simplified descriptions may then be passed to other learning syst~ms In

lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or

to build a hierarchical interpreuticn of the uamples in terms of hell subparts

ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture

discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance

of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt

along with the components comprising a substructure [etods for geneatlng alte-1ative

substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among

these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the

substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on

2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to

describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre

s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre

dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg

the previously described t~hni~ues

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 10: DISCOVERING SUBSTRUCTURE IN EXAMPLES

CHAPTER 2

stBSTRUCTtJRE DISCOVERY

Te major focus of t1is work IS to investigate methods for discoveing substructure concepts

in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni

and repetitive -hunks- of structure within the individual elements of a set of structured examples

Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified

by replacing all occurences of ~he substructure ith a single form that represents the

substructure The simplified descriptions may then be passed to other learning syst~ms In

lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or

to build a hierarchical interpreuticn of the uamples in terms of hell subparts

ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture

discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance

of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt

along with the components comprising a substructure [etods for geneatlng alte-1ative

substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among

these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the

substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on

2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to

describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre

s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre

dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg

the previously described t~hni~ues

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 11: DISCOVERING SUBSTRUCTURE IN EXAMPLES

the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-

the concepts implied by the environment

Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~

hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy

With an appropriate representation for substructures and the substJcture hierarchy a

substructure discovery sysem can perform the asks Just presented

22 Substructure Representation

A substructure is both a portion of a collection of structurally-related obects and a

cesclption of the concept represented by that portion For uample the detaied structure

CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard

bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs

However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the

concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of

SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d

structurally-related objects The representation of strllcured oncepts sbould be onducive to tile

wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee

conceptS and a language for describing tne concepts

A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl

representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed

slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the

Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr

annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 12: DISCOVERING SUBSTRUCTURE IN EXAMPLES

SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St

elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute

Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e

Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the

nput nampie n Figure 211 IS

lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt

red-I ill

green-i SI ~ ~------------~~~lt- --- OBJECT-0001

) ----Rl

1amp --- OBJECT-0002 amp-blue

I~l 154- red LshyI I

I j

blue blue

(1) I1put Example ()) SubstJcure

Fig1Jre 21 Exa~plt Substrlc~re

9

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 13: DISCOVERING SUBSTRUCTURE IN EXAMPLES

[nput Example Substlcture

A)

V

(b) Object Overlapping Substructure

(c) Object and Relation Overlapping Substructure

Figure 22 Disjoint and Overlappin Substructures

Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~

~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the

densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle

lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 14: DISCOVERING SUBSTRUCTURE IN EXAMPLES

factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl

Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e

(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing

subsuIctures and substructure speialization to attach contextual nformation to newiy discovered

substructures

2S Substructure mstampl1tialioD

Once an interesting substructure is disltovered the input eumples can be recast by replactng

the obJects and relations of each occurrence of the substructure with a single entity representing

tbe abstact substructure This replacement is termed substructure illstanliatwlt After

pltrforming one substructure instantiation a substrueture discovery system may continue to

discover more abstract substructures in terms of those already instantiated in the input eumples

This section considers two methods of substrueture instantiation

One method of substructure instantiation replaees eaeb occurrence by a single object

Difficulties in using this method arise from representation roblems Although all the ob]fCts and

relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not

replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be

maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only

confounds the insuntiaion problem In the case of object overlap each instartation of the

overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of

relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as

well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre

instantiation using a singie new object Although single object substructure instantiation seens

intuitively promising tbe accompanying representation problems are difficult to overcome

An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of

the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the

slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the

17

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 15: DISCOVERING SUBSTRUCTURE IN EXAMPLES

3 SubStructure Generation

AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~

and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l

bottom-up and top-down

The bottom-up approach to substructure generation be~lns with the smallest substrJctures n

the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished

by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0

the substructure For enmple according to tbe tllee neighboring relations (presented in Section

2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the

substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures

lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt

lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt

lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt

The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O

S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For

example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10

suestrJures

ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt

The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie

suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0

smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be

11

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 16: DISCOVERING SUBSTRUCTURE IN EXAMPLES

lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1

jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_

uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly

dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec

substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1

are applicable here

Regardless of how tae metbods are applied each metbod bas advantages and disadvantages

Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev

SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle

combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a

larger substructure but a smaller more desirable substructure may be overlooked in the process

Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS

referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones

linimal expansion begins with smaller substructures expands substructures along one relation

lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource

imlts of tbe system

2~ Substructure Selection

Aier using tbe metbods of the previous sectlon to construct l set of alternatlve

substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0

onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e

roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie

substructures oased on ~her heuristic quality This section presents the major heuristics that 3re

Jpplicable to substructure evaluation

The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata

cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie

savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e

13

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 17: DISCOVERING SUBSTRUCTURE IN EXAMPLES

Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad

eiatlons of ~be origmal OCClrre1lces may be reconstructea

T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=

replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla

obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue

ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of

concepts deAned In teClS of the Instantiated substructures

26 Substructure Specialization

Specializing substructures is an essential capabtlity of a substructure discovery system For

Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f

mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences

Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the

aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached

atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this

specialization step allows the substructure discovery system to take advantage of additional

information in the input examples avoid learning overly general substructure concepts and Clore

rapidly discover a specific disjunctive substructure concet T115 section resents an approach to

substructure specialization

One technique for specializing a substructure is to ~rform inductive Inference on the

extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the

substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended

occurrences consists of the substructures obtained by upanding each occurrence llth each

neighboring relation of the occurrence Once the set of eltended occurrences is onstructed

Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added

neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg

relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres

19

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 18: DISCOVERING SUBSTRUCTURE IN EXAMPLES

2 - Background Knowledampe

Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s

wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e

discovery process along more promising paths through tbe space of alternatlve substructJres T~lS

section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate

substructures that are more likely to list in tbe current application dOClaln

The ~wo major functions of tbe substructure backround knowledge are to main tain boh

lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a

glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of

substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu

substructures are defined in terms of more primitive substructures This arrangement suggests ~n

archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie

substructures serve as the justifications for more complex substructures at a higher level in tle

hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote

substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie

substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures

elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying

tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput

uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are

ultimately justified by the relations in tbe input eumples The substructure discoey roess

may then use these background knowledge substructures as a starting point for ieneratllg

alternative substructures

Other f Jrms of background knowledge apply to the task of substrlcure ilscovery

mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses

~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee

1

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 19: DISCOVERING SUBSTRUCTURE IN EXAMPLES

L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do

BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do

if (not (member e 0raquo S S U leI

return D

Figure 24 SubstrUcture Discovery ~lgoritbm

Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom

the substructures In S until either the computational limlt L is exceeded or the set of candidate

substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe

heuristics of Section 2A are employed to choose the best substructure from the llternatives in S

The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he

implementation Section 322 descibes the comrutatlon used in StBDlE although otbe

methods may be used For instance the heutlStics could be weighted selected by lhe user or

selected by lhe backcround knowledae However uy substructure selecion method should

involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in

BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of

discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi

Section 23 are then used to construct a set of new substructures that are stored in E Those

subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the

loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res

23

CHAPTER 3

mE SUBDUE SYSTE~I

An implementation of he substructure discovery methodology descrIbed in the frevous

chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments

Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a

substructure concept discoverer and as a mod le in a more robust nachlne learning system In

addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a

sbstructure specialization module and a substructure background knowledge module for utiiizilg

prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd

krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres

ovillCh of these substruc1res are resent in the lnput examples

i

SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of

SubstructJre Speeializer-k of(

C ser-Supplied Input Exampies Substructure Discovery

Figure 31 Tlle SlSDLE System

(a) SUbstructure

[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]

(b) External Representation

I I

~ Sl

i V

~-T t -

I

~ Cl i

(c) Inte~al Represe~tation

Figure 32 Substructure Representation

21

SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do

SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI

return -EWSlBS

Figure 33 Substructure Elpanslon Algorithm

Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae

stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are

stored in For each neighboring relation a new substructure is formed by adding the nelghborrg

-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to

the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een

considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes

~xanded from the original substructure

322 Selection

The heuristic-based substructure discovery module selects for orslderation those

substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs

compactness connectivity and covenge These four heurlstus are used 0 order he set of

alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut

ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~

selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons

~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he

eurstlc value of a substructur

29

urlent set of Input xam-llS

deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e

substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le

Input examples

r rtlations Sl compactnns(S) = ----shy

For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus

conpactness bull 4 bull 1

The third heUristic connectivity measures the amount of extemal connection in 1he

occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of

external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be

onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed

connectivit-Spound) = locel

Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In

Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure

22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO

outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons

Thus connectivity (12 41 bull 13

T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es

31

ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt

First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~

tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as

there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle

substructure as denned in Section 3U The Object names within the substructures le aroltrar

symbols generated by the system for eacb ewly constructed substructure

S bull i lt 11 OBJECT-0001gt f

~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S

and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by

addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5

ae st)red In E

Input Example Substructure

amp i 511

~I Rl I- lSi ~

52 I S I

LJ L

Figure 3-1 Simple Example

33

11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~

1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l

le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~

suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~

by the algorabm

33 Sllbstructure Specialization

The substructure specialiution module In SLBDLE employs t simple teChlllq ue r

s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25

SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added

attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e

)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh

the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e

i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota

eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d

knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e

s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton

rocess

331 Specia1ization Algorithm

Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te

algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~

elamrles

he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~

Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l

osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~

eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e

35

~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor

Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle

mount of specializatlon In Srpc

A-rount_of_spKlaLization( S_) = --_______ (occl

The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture

acKiround knowledge along w1tb the originally discovered substrucure

332 Example

As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a

lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute

-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he

sare substruc~ure emerges as in Figure 3-

S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt

ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule

Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES

A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l

~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli

s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt

Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~

SPECSLBS In thIs example IS

S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt

T~is specialized substructure also las J occurrerces but 3 untque values thus

amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC

specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he

substrucnre background knowledge along with the oriiinally discovered substucur

SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon

about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be

s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred

substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq

SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue

nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization

nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be

mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec

s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks

3 SubstMlcture Background Kllowledge

T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts

39

O T SHA ETRL~GLE SHAPE-SQLARE

Figure 37 Substructure Backgolnd KnowledgeEumpie

ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt

su~ported

As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge

lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle

ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s

SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y

Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute

est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt

he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn

n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and

~1

1--- ~ed v blue ~

i I shyI

I~

I

I

I

I

SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE

Fgure 39 Three Background Knoledge $ucstroJcJres

he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e

new substructu-es In terms of the substructures already known

3 bull 42 IdentifyiD1 Substructures in Eumples

As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-

~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___

~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I

i SH~PETRIAGlE i t

I I

I I I SHAPEmiddotSQlARE

[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]

Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]

Fig-ure 310 Substructure IdentiAcation Eumple

substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles

disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks

--

lll=H Eumple - r

i III

I I H

8r- shyj

V V

Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red

C Value(S)a8

L 6 a

al uelt S)-312

alueltS)-21

0 alue(S)96

middotalue(S)-11

6 I

bt

10 ~ V

- -

I

alue(S)middot31~ aue(S~-96 -- - middotalue(S)92

A I

j

aI ValueS)-312 I

I I

I bull

---

- -I

Vaiue(S)-103

rgure 41 First Etample of EXiIrinent 1

elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS

of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll

EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$

overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0

consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r

be Input examples

2 Ex~riment 2 Specialization and Background Knowledge

The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem

Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0

~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS

lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege

During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task

As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut

discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs

acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa

Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed

subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk

The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure

-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e

cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s

shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese

discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized

substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed

rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces

the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed

substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe

background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to

deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe

background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres

Ii

C 0

H- C C - Ii C1

(Sr C1 rl

C C ~

C C C-H C C- ii C C-Ii 1 i

~ Sr- C C C

H-C C-Ii H C

Ii H

H

fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule

13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples

tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~

of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna

clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn

whIch to focus ile attentIon of the conceptual clusterI~ rocess

A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA

Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer

lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific

Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e

added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features

t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and

CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered

Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe

SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas

r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single

)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d

subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle

substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure

spannng ncre han one ualple

Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni

and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle

53

nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy

descrlOed il [10gensel87J the two best d usterngs discovered are

Color of engine Aheels IS Blackmiddot middotWhitemiddot

W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd

subs-~c~~re discovery =odule is

lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt

In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By

addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI

exa~Fles Je ~wo best lusterngs discovered lre

umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four

Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec

JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure

attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen

har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE

T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~

suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the

StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne

IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree

that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators

within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover

macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~

CiscJsses similrlties and differences between SLBDLE and PLoSD

Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure

of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The

Jperators forths conaln are taken from [isson80] and are repeated e10w

PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY

PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)

STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)

LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)

For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec

e

51

STACK(XZ)

~ CSTACKCYZ) PICKLPOi)

PCTDOW(Y)

Figure 49 Diseovered (alto-operator

system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy

vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be

~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex

45 Eltperiment 5 Data Abstraction and Feature Formation

EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he

cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3

arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng

on the same Tu3S Instruments Eplorer as the SLBDCE system

Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les

given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores

to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in

the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of

178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J

5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C

SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c

61

--

~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll

Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~

hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH

program searCles for tWo tpes of substrucure In tbe blocks world domain The first type

involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a

set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by

the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes

of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall

1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he

ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of

SlBDLE

lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S

cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce

OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence

~ ~ shyB

I ~ (

E G- c c=7 0 IA B CD

i Lr

(a) ( )

63

A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1

0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL

E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL

The tommcm-realiotships-list contairs the four relations Ossessed by more than half the

candidates

Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt

~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le

Sumber of propUiH In Intltwto

Number of propenlH n Ilnlon

vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-

ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre

A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5

65

~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e

ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~

difficult Running both tbe sequence and common relation discovery processes nore tban once

mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl

difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd

domain

T~e final comFarlson of 11e two syseS Involves ~he representation used to store the

discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the

subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe

sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot

lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1

roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse

Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines

comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe

semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto

oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0

SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert

background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l

more general epresentaticn such as the semantic network used y ~he ARCH pro~ran

53 COfDitive Optimization in olrs S~R Program

R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte

~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt

[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S

Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-

-(5-s)C =---shy

s

slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s

represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1

polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a

inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR

seeks a grammar whose eiementS m8llnize eVe

S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n

322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number

of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f

OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings

of he substrucre can be upressed as

Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy

cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not

conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on

he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into

~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture

lmprovements to tbe cognltive savings measure

~ Substructure Oiscovery ill Uihitehalls PLAD System

Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons

These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn

69

The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as

Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs

efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda

overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta

mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence

Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ

COXTEXT 1 ogsav macro OS

100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot

8 5 ~u - (~tmiddot z)middot

Instantiated Action Sequlnce lt B 112 Z ~11J Z

COlEXT 2 ogsav macro~s

20 ~l bull (~lumiddot Z)shy

Instantiated Action ~uence A B 11

Al interesting macrops discovered End

Figure 53 PLAXD Etam~le

1

OLPTER 6

COSC1USION

Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S

of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty

Url about such an environnent the entity nust abstract over unInuresting jetJil and discover

substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s

teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced

domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l

this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve

61 Summary

The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl

onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s

aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e

environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge

equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl

aleratves nvoiec In 3 computational method 01 discovering sustrucure

A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y

he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans

omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg

t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre

imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5

T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~

iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~

he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~

optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~

The ~ialization module modina tile substructure to apply in a more constrained ewironme

Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i

nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S

aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f

be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes

and the possible applications of the SLBDLE substructure discovery system in a vanety f

domaltls

Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf

~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-

lass vf problems in the area of nachine learning OJeraton In real-world domains demands f

eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates

n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll

of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes

Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1

S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~

62 Future Research

$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE

system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei

include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS

Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy

reVIOUS suess of silIlLu rJe appiiauns

esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of

substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can

tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]

lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can

be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions

in both the background knowledge architecture 3nd the opeations peforned by the discovery

algorlthm

Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess

may arise from a better method of substrlctie instantiation C rrent letbocs do not

satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation

by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C

lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres

exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba

Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods

would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery

process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt

the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be

discovered with a single application of the discovery algcnthr1

Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~

uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve

the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery

rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge

z~ess l ~

One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount

knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a

knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts

623 Applications

Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste

SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and

compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung

systems

One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage

composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e

eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon

slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of

abstraction

The n~ to access information fom larse databases has betooe oonlonlac In ncs

omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so

do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver

repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard

Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte

database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture

sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1

9

REFERE~CES

[Doyie79J

[HoaS3]

L --]~ ason

~Iicalski33bl

G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)

J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162

J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27

R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288

K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987

W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933

W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947

J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J

General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16

J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977

D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC

~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239

R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy

R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13

R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363

S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3

T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50

J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp

~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~

83

- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es

DeJ IS 11

dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T

s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11

nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11

race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T

Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order

=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je

~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form

(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I

The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC

The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure

ltela01s s the list ~f ~eatlons deining le substrucure r occurenc

In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage

dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec

85

gt bull ~

)~c1 tt I

0 c5 ~ -

13 Output for First Example of Experiment 1 Limit - 6

3qll rilebullbullbullbull

anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~

Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll

S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi

-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I

(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l

Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD

-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i

S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)

11 aCC

A14 Output for First Example of E(periment 1 Limit 10

gt svIdue liait 10)

n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t

Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco

T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~

S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO

~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS

~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J

A 16 Input for Second Example ot Experiment 1

Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J

conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)

Al7 Output for Second Example ot Etperitnent 1 Limit - 4

l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I

5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs

c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I

middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt

middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I

So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS

coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j

ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103

~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~

89

S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a

ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO

~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H

ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09

lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()

~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull

[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I

l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i

[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~

C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i

SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1

co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i

middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))

Ec ~lce

Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10

Betti ~Ice

i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~

umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1

SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -

1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G

91

s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no

TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~

~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f

=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ

~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i

A2 Experiment 2

Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd

saostructure background knowledge nodule Two examples from the chemical dooain ae

eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng

previously disc ered subst-uc~ures to subsequent discovery tasks

11 Input for First Eumple of poundXperimenl 2

kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i

S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))

slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)

sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J

~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI

Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))

VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)

( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I

(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i

(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I

sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)

Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j

SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)

~o e ) ~Wlnt ubstUC~oltes

Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo

95

gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~

U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l

~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J

carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )

Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))

Defxollrii ilill J ~cIl Clr carJ)

I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)

Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J

ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))

A32 Output for Experiment 3

gt lubclue inl~ JO)

Seli tlcebullbull_

m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil

SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo

aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot

ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~

licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)

Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc

wri OCCtilJtOiCS

101

4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4

~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i

(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))

42 Output for Experiment 4

PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil

Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)

ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl

u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1

O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~

s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI

W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -

~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a

S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1

TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a

103

~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~

Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-

bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)

ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)

I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))

r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l

(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)

(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)

coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)

Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i

(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at

Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e

(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ

~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj

OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs

I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten

~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull

9

SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1

I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1

Addina substructures 0 SKbullbullbull

O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1

5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I

A3 Experimeut 3

Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15

hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le

desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e

99

sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~

5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~

1~Ie c2 lt6 ~u)

QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32

c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)

lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J

SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))

A52 Output for Experiment S

gt (sulxh bullbull lmu )

Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll

RUMinl sulntucture discovey

DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl

Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~

WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11

( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I

(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I

[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt

~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90

Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J

ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot

~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I

10$

bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D

icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)

([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD

((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I

([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~

101

middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a

oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0

CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1

Ed ~lCe

109

Page 20: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 21: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 22: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 23: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 24: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 25: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 26: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 27: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 28: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 29: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 30: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 31: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 32: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 33: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 34: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 35: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 36: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 37: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 38: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 39: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 40: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 41: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 42: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 43: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 44: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 45: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 46: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 47: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 48: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 49: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 50: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 51: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 52: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 53: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 54: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 55: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 56: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 57: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 58: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 59: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 60: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 61: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 62: DISCOVERING SUBSTRUCTURE IN EXAMPLES
Page 63: DISCOVERING SUBSTRUCTURE IN EXAMPLES