DISCOVERING SUBSTRUCTURE IN EXAMPLES
Transcript of DISCOVERING SUBSTRUCTURE IN EXAMPLES
~ay 1988 CI-C -EG-88-2223
COORDI~-TED SCIE~CE L u30 RTORY CJi~ege of Engineering
DISCOVERING SUBSTRUCTURE IN EXAMPLES
Lawrence Bruce Holder Jr
t~I-ERSITY OF ILLINOIS AT URBAN-CH~rv[PAIG~
ApprYed r~r Pubiic Release Distribution Lnlinuted
------------~------------------~~~~~~~~--~~~--------------------------------REPOAT DOCUMENTATION AGE OI10lT SiCftlTY CiASSPI(A flON
Unc las s te1 ed
it NAME OF IIpoundRfQAMING ORGANIZATlON
Coordinated Science Lab University of Illinois
k AOORESS (Citybull $ fI1III l~ (OOI
1101 W Springfield Avenue Urbana IL 61801
~ME OF FUNDING J SPONSORING ORGArZATlON ~S ~ I (l IOAPA
k AOORESs(CiC) StItbullbull ZI
Jas 0 in 2 t en bull gt bull C ) ~R~ 52Ar un ~ton A ____ _
Arli~zton VA 2~~OQ_23ng
6111 OFFICI SYMIOI (I Iblf
NIl
111 OFI(( SYMIOL (If 1ItJIi(10III1
~isc~ver~ Substructure i~ ~xaMles
Il IiIUISONAI AIJHOItI~older Jr bull Lawrence 3ruce
13 rtPE JF RlPORT achnial t11b rIME CD~EREO
-ROM 0
0 ill$Tll(1I 4AftI(INGS
~one
l OISTIUTCNIA~UTY 0 velr Approved for public releasej distribution un11~1ted
7bull ~AMI OF MONITORING OAGMILtTlON
~S~ 0~R and DARPA
7b AOO155 (City Stiff rwI llli (OtIfII ~shingtgn Dr 1055~
Aclin~ton ~A ~202
Arlin~ton ~A 22n9-23QR t fItOCUQMINT NSTftUMINT IDiNTlFlCA1ION NUMIER ~SF IST-85-11170 ~OOOL4-32-K-01R6 ~o0014-~i-K-0874 -
TASK NO
COSAn coon bull SU8JICT rEitMS CormllCl4t Ott l1~ru 1 usury WI OItrtI~ br 0I0dt MJlftOfrJ
lachine learning substJC cre ~ iscover S3DtE
i5 htslS dllCflb-S a methoci for discoering substructure concepts in eX3mJ)les 7hl method inoes a computa llonail constrained bn-orst search Iuuied by four heuristiCS coglitle ~alnt~ compactness connectivity and covenge Elc heuristic slescrbed n detail aion~ Ith ts rOie in tllultln ln mciciltal SU-stTJCIUre concept The SlBDLl s-slm ~Ill ImJlemencs the method contains l substructure discoe-y module l substructure specialiZ3tlon module nl i Incemer-tal substructure aCKiround knolecige module for applYlni jleously discoered substructure oncepl 71e sub~tructue back round lo~led~e Includes both user-delineci ~nd SSJlE-Jiscovertd sub~lruCturtS n a hierarch ampt s used to jetene bC1 sulstructures are present In the Input examplelt 7he ~ynem has periormed Qell on nllnoer oi ~iamC5 r)m dllfe~ent domainS and has discoerecl man rtetlt5Ilnpound ~ubstructure concepls such 1$ a )omlc 1~t )11 Taco-operllor for ltacillng blocks he Tc-od and mpltTentltion oi tle StBDLE systen u d~~C=IC(d lci In lUl~lI$ oi txpe-tmentll result s presented
20 1ali ilOIli AVAlioAIIUTY OF A8STRACT l1 ISTIACT S~JIUTY CuSIFI(ATlON tl N(USIFIEO~NIMITEO a 5AM( AS ~T o OrC USEitS tncass ~aa
jl APlIIGlllon TIl oe uS44 ojMII tampllwIt All otr edltlO re OOl4let bull
OISCOVERIG StBSTRCCTLRE 1 EX--lPLES
8Y
L- WRECE 8RLCE HOLDER JR
8S Lmiddotrive-sity of Illinois at Lrbana-Cbampaign 1986
THESIS
Submitted in partlll fulfillment of tne requIrements fvr tne degree of -laster of Sciene in Computer Science
in the Graduau Cgtlege of the Lnwersity of Illinois at Lrbana-Champaign 1988
Lrbana IllinOIS
Lawrence Bruce Holder Jr 1S Department of Computer Science
tniversity of Illinois at trbana-Champaign 1988 Robert Earl Stepp m Advisor
T~llS thesis describes a method for disltovering substructure oncepts in uamples The method
Involves a computationally constrained best-first search guieed by four heuristics cognitive
savings compactness connectivity and coverage Each heuristic s described in detail along with its
role In ealuatng an individual substructure concept The SlBOlE system that implements the
method contains a substructure discovery module a substructure specialization module and an
mcemenal substructure background knowledge module for applying previously discovered
substructure concepts The substructure background knowledge includes both user-defined and
SLBDLE-discovered substructures in a hierarchy that is used to determine which substructures art
present in the input eumples The system bas performed well on a number of eumples from
dderent domains and has discovered many interesting substructure conceU sucb as an aromatic
rrg and a macro-operator for stacking blocks The nethod and implementation of the SlBDLE
sstem are described and an analysis of experimental results is resented
iii
IJould like 0 thank my advisor Robert Stepp for ~S uoerous onrbutors tJ te etas
within this thesIs His talent for Insigbtful thinking and patient communicatlon has nade c
work both enjoyable and rewarding
I would also like to thanJc Brad Whiteball Bob Reinke Bharat Rao Diane COOk and Bob 1041
for their comments and suggestions regarding this work Tlanks also to the rest of the Artficai
Intelligence Researlh Group at the Coordinated Science Laboratory especially to my office mate
~ott Benr~ett for his patient participation in our discussions Special thanks to our s~e3r
Fancle Bridges for her ever jleasant assistance
Finally I would like to thank my family for their enthusiastic support and encouragement
brougcout my academic endeavors
This researCl was partIally supported by the ationa1 Science Foundation under grant SF
IST-85-11170 the Offe of ~aval Researcll under grant ~OOO14-82-Kmiddot0186 by Defense Advanced
Research ProJ~ts Agency under grant ~OOOl$-ampi-K-Oamp7$ and by a gIft from Texas rnstrumenu
Inc
T vlLE OF CO-TETS
CHAPTER
1 ITRODlCTIO
2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon
I
23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln
3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25
321 GeneratIon 25 322 Selection 29 323 ExamFle 32
33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37
34 Substructure Background Knowledge 39
3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13
EXPERI(E-n -16
41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59
c REL~TED VORK 62 51 Gestalt Pshoiogy 62
52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69
CH~TER
6 COCL tSIO
6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~
611 Substucture DiSCOvery and InstantIation 6
622 Background Knowledge 75 623 ApplicatIOns 79
REFERECES 51
APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93
~ 3 Experllcent 3 99
~ Etperiment J 102 AS Experiment 5 104
vi
QPTER 1
LTRODtCI10N
At any Iven moment the amount of detailed information available fr~m an envlronmel s
overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d
rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and
mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard
ettract infermation from the environment at a level of detail that is appropriate for he purpose cf
tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert
more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a
com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a
strucured enVrgtnment
Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg
nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f
hese stl1cural primitlves From the observations at different levels of detail humans na
construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be
descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts
Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he
bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~
form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel
of detail for represen~ing ~he environment
Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels
nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl
1
nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e
1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~
s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e
suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J
definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili
erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he
aforementlcned cmponents
Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained
In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a
substructure specaliZltion module and a substructure background know1edge module After a
substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized
substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the
substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve
substructures tpon receiving a new set of input examples the background know~dgt rnodJe
determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE
system runs a hie3rcllical representation of selected structures found in he env ironnent ~s
constructed within tlle background knowledge modJle
Chapter presents several experiments witll the StBDLE systec EXFements wall tne
s1bstructure discovery module indicate that the suostucture generation and substructure seiecton
processes triorm vell in guiding the search towards more interestln~ substructures Other
experiments incorporating botll the substructure background keowledge module lnd spetallutlon
module demonstate tbe performance improvement obtaned by using revlousiy earned
sulstructures In subsequent discovery tASks The input and output data for each exerlent are
isted in ApperdlI A
Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i
3
CHAPTER 2
stBSTRUCTtJRE DISCOVERY
Te major focus of t1is work IS to investigate methods for discoveing substructure concepts
in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni
and repetitive -hunks- of structure within the individual elements of a set of structured examples
Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified
by replacing all occurences of ~he substructure ith a single form that represents the
substructure The simplified descriptions may then be passed to other learning syst~ms In
lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or
to build a hierarchical interpreuticn of the uamples in terms of hell subparts
ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture
discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance
of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt
along with the components comprising a substructure [etods for geneatlng alte-1ative
substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among
these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the
substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on
2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to
describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre
s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre
dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg
the previously described t~hni~ues
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
------------~------------------~~~~~~~~--~~~--------------------------------REPOAT DOCUMENTATION AGE OI10lT SiCftlTY CiASSPI(A flON
Unc las s te1 ed
it NAME OF IIpoundRfQAMING ORGANIZATlON
Coordinated Science Lab University of Illinois
k AOORESS (Citybull $ fI1III l~ (OOI
1101 W Springfield Avenue Urbana IL 61801
~ME OF FUNDING J SPONSORING ORGArZATlON ~S ~ I (l IOAPA
k AOORESs(CiC) StItbullbull ZI
Jas 0 in 2 t en bull gt bull C ) ~R~ 52Ar un ~ton A ____ _
Arli~zton VA 2~~OQ_23ng
6111 OFFICI SYMIOI (I Iblf
NIl
111 OFI(( SYMIOL (If 1ItJIi(10III1
~isc~ver~ Substructure i~ ~xaMles
Il IiIUISONAI AIJHOItI~older Jr bull Lawrence 3ruce
13 rtPE JF RlPORT achnial t11b rIME CD~EREO
-ROM 0
0 ill$Tll(1I 4AftI(INGS
~one
l OISTIUTCNIA~UTY 0 velr Approved for public releasej distribution un11~1ted
7bull ~AMI OF MONITORING OAGMILtTlON
~S~ 0~R and DARPA
7b AOO155 (City Stiff rwI llli (OtIfII ~shingtgn Dr 1055~
Aclin~ton ~A ~202
Arlin~ton ~A 22n9-23QR t fItOCUQMINT NSTftUMINT IDiNTlFlCA1ION NUMIER ~SF IST-85-11170 ~OOOL4-32-K-01R6 ~o0014-~i-K-0874 -
TASK NO
COSAn coon bull SU8JICT rEitMS CormllCl4t Ott l1~ru 1 usury WI OItrtI~ br 0I0dt MJlftOfrJ
lachine learning substJC cre ~ iscover S3DtE
i5 htslS dllCflb-S a methoci for discoering substructure concepts in eX3mJ)les 7hl method inoes a computa llonail constrained bn-orst search Iuuied by four heuristiCS coglitle ~alnt~ compactness connectivity and covenge Elc heuristic slescrbed n detail aion~ Ith ts rOie in tllultln ln mciciltal SU-stTJCIUre concept The SlBDLl s-slm ~Ill ImJlemencs the method contains l substructure discoe-y module l substructure specialiZ3tlon module nl i Incemer-tal substructure aCKiround knolecige module for applYlni jleously discoered substructure oncepl 71e sub~tructue back round lo~led~e Includes both user-delineci ~nd SSJlE-Jiscovertd sub~lruCturtS n a hierarch ampt s used to jetene bC1 sulstructures are present In the Input examplelt 7he ~ynem has periormed Qell on nllnoer oi ~iamC5 r)m dllfe~ent domainS and has discoerecl man rtetlt5Ilnpound ~ubstructure concepls such 1$ a )omlc 1~t )11 Taco-operllor for ltacillng blocks he Tc-od and mpltTentltion oi tle StBDLE systen u d~~C=IC(d lci In lUl~lI$ oi txpe-tmentll result s presented
20 1ali ilOIli AVAlioAIIUTY OF A8STRACT l1 ISTIACT S~JIUTY CuSIFI(ATlON tl N(USIFIEO~NIMITEO a 5AM( AS ~T o OrC USEitS tncass ~aa
jl APlIIGlllon TIl oe uS44 ojMII tampllwIt All otr edltlO re OOl4let bull
OISCOVERIG StBSTRCCTLRE 1 EX--lPLES
8Y
L- WRECE 8RLCE HOLDER JR
8S Lmiddotrive-sity of Illinois at Lrbana-Cbampaign 1986
THESIS
Submitted in partlll fulfillment of tne requIrements fvr tne degree of -laster of Sciene in Computer Science
in the Graduau Cgtlege of the Lnwersity of Illinois at Lrbana-Champaign 1988
Lrbana IllinOIS
Lawrence Bruce Holder Jr 1S Department of Computer Science
tniversity of Illinois at trbana-Champaign 1988 Robert Earl Stepp m Advisor
T~llS thesis describes a method for disltovering substructure oncepts in uamples The method
Involves a computationally constrained best-first search guieed by four heuristics cognitive
savings compactness connectivity and coverage Each heuristic s described in detail along with its
role In ealuatng an individual substructure concept The SlBOlE system that implements the
method contains a substructure discovery module a substructure specialization module and an
mcemenal substructure background knowledge module for applying previously discovered
substructure concepts The substructure background knowledge includes both user-defined and
SLBDLE-discovered substructures in a hierarchy that is used to determine which substructures art
present in the input eumples The system bas performed well on a number of eumples from
dderent domains and has discovered many interesting substructure conceU sucb as an aromatic
rrg and a macro-operator for stacking blocks The nethod and implementation of the SlBDLE
sstem are described and an analysis of experimental results is resented
iii
IJould like 0 thank my advisor Robert Stepp for ~S uoerous onrbutors tJ te etas
within this thesIs His talent for Insigbtful thinking and patient communicatlon has nade c
work both enjoyable and rewarding
I would also like to thanJc Brad Whiteball Bob Reinke Bharat Rao Diane COOk and Bob 1041
for their comments and suggestions regarding this work Tlanks also to the rest of the Artficai
Intelligence Researlh Group at the Coordinated Science Laboratory especially to my office mate
~ott Benr~ett for his patient participation in our discussions Special thanks to our s~e3r
Fancle Bridges for her ever jleasant assistance
Finally I would like to thank my family for their enthusiastic support and encouragement
brougcout my academic endeavors
This researCl was partIally supported by the ationa1 Science Foundation under grant SF
IST-85-11170 the Offe of ~aval Researcll under grant ~OOO14-82-Kmiddot0186 by Defense Advanced
Research ProJ~ts Agency under grant ~OOOl$-ampi-K-Oamp7$ and by a gIft from Texas rnstrumenu
Inc
T vlLE OF CO-TETS
CHAPTER
1 ITRODlCTIO
2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon
I
23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln
3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25
321 GeneratIon 25 322 Selection 29 323 ExamFle 32
33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37
34 Substructure Background Knowledge 39
3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13
EXPERI(E-n -16
41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59
c REL~TED VORK 62 51 Gestalt Pshoiogy 62
52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69
CH~TER
6 COCL tSIO
6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~
611 Substucture DiSCOvery and InstantIation 6
622 Background Knowledge 75 623 ApplicatIOns 79
REFERECES 51
APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93
~ 3 Experllcent 3 99
~ Etperiment J 102 AS Experiment 5 104
vi
QPTER 1
LTRODtCI10N
At any Iven moment the amount of detailed information available fr~m an envlronmel s
overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d
rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and
mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard
ettract infermation from the environment at a level of detail that is appropriate for he purpose cf
tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert
more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a
com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a
strucured enVrgtnment
Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg
nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f
hese stl1cural primitlves From the observations at different levels of detail humans na
construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be
descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts
Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he
bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~
form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel
of detail for represen~ing ~he environment
Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels
nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl
1
nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e
1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~
s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e
suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J
definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili
erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he
aforementlcned cmponents
Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained
In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a
substructure specaliZltion module and a substructure background know1edge module After a
substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized
substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the
substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve
substructures tpon receiving a new set of input examples the background know~dgt rnodJe
determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE
system runs a hie3rcllical representation of selected structures found in he env ironnent ~s
constructed within tlle background knowledge modJle
Chapter presents several experiments witll the StBDLE systec EXFements wall tne
s1bstructure discovery module indicate that the suostucture generation and substructure seiecton
processes triorm vell in guiding the search towards more interestln~ substructures Other
experiments incorporating botll the substructure background keowledge module lnd spetallutlon
module demonstate tbe performance improvement obtaned by using revlousiy earned
sulstructures In subsequent discovery tASks The input and output data for each exerlent are
isted in ApperdlI A
Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i
3
CHAPTER 2
stBSTRUCTtJRE DISCOVERY
Te major focus of t1is work IS to investigate methods for discoveing substructure concepts
in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni
and repetitive -hunks- of structure within the individual elements of a set of structured examples
Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified
by replacing all occurences of ~he substructure ith a single form that represents the
substructure The simplified descriptions may then be passed to other learning syst~ms In
lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or
to build a hierarchical interpreuticn of the uamples in terms of hell subparts
ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture
discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance
of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt
along with the components comprising a substructure [etods for geneatlng alte-1ative
substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among
these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the
substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on
2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to
describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre
s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre
dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg
the previously described t~hni~ues
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
OISCOVERIG StBSTRCCTLRE 1 EX--lPLES
8Y
L- WRECE 8RLCE HOLDER JR
8S Lmiddotrive-sity of Illinois at Lrbana-Cbampaign 1986
THESIS
Submitted in partlll fulfillment of tne requIrements fvr tne degree of -laster of Sciene in Computer Science
in the Graduau Cgtlege of the Lnwersity of Illinois at Lrbana-Champaign 1988
Lrbana IllinOIS
Lawrence Bruce Holder Jr 1S Department of Computer Science
tniversity of Illinois at trbana-Champaign 1988 Robert Earl Stepp m Advisor
T~llS thesis describes a method for disltovering substructure oncepts in uamples The method
Involves a computationally constrained best-first search guieed by four heuristics cognitive
savings compactness connectivity and coverage Each heuristic s described in detail along with its
role In ealuatng an individual substructure concept The SlBOlE system that implements the
method contains a substructure discovery module a substructure specialization module and an
mcemenal substructure background knowledge module for applying previously discovered
substructure concepts The substructure background knowledge includes both user-defined and
SLBDLE-discovered substructures in a hierarchy that is used to determine which substructures art
present in the input eumples The system bas performed well on a number of eumples from
dderent domains and has discovered many interesting substructure conceU sucb as an aromatic
rrg and a macro-operator for stacking blocks The nethod and implementation of the SlBDLE
sstem are described and an analysis of experimental results is resented
iii
IJould like 0 thank my advisor Robert Stepp for ~S uoerous onrbutors tJ te etas
within this thesIs His talent for Insigbtful thinking and patient communicatlon has nade c
work both enjoyable and rewarding
I would also like to thanJc Brad Whiteball Bob Reinke Bharat Rao Diane COOk and Bob 1041
for their comments and suggestions regarding this work Tlanks also to the rest of the Artficai
Intelligence Researlh Group at the Coordinated Science Laboratory especially to my office mate
~ott Benr~ett for his patient participation in our discussions Special thanks to our s~e3r
Fancle Bridges for her ever jleasant assistance
Finally I would like to thank my family for their enthusiastic support and encouragement
brougcout my academic endeavors
This researCl was partIally supported by the ationa1 Science Foundation under grant SF
IST-85-11170 the Offe of ~aval Researcll under grant ~OOO14-82-Kmiddot0186 by Defense Advanced
Research ProJ~ts Agency under grant ~OOOl$-ampi-K-Oamp7$ and by a gIft from Texas rnstrumenu
Inc
T vlLE OF CO-TETS
CHAPTER
1 ITRODlCTIO
2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon
I
23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln
3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25
321 GeneratIon 25 322 Selection 29 323 ExamFle 32
33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37
34 Substructure Background Knowledge 39
3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13
EXPERI(E-n -16
41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59
c REL~TED VORK 62 51 Gestalt Pshoiogy 62
52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69
CH~TER
6 COCL tSIO
6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~
611 Substucture DiSCOvery and InstantIation 6
622 Background Knowledge 75 623 ApplicatIOns 79
REFERECES 51
APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93
~ 3 Experllcent 3 99
~ Etperiment J 102 AS Experiment 5 104
vi
QPTER 1
LTRODtCI10N
At any Iven moment the amount of detailed information available fr~m an envlronmel s
overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d
rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and
mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard
ettract infermation from the environment at a level of detail that is appropriate for he purpose cf
tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert
more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a
com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a
strucured enVrgtnment
Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg
nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f
hese stl1cural primitlves From the observations at different levels of detail humans na
construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be
descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts
Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he
bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~
form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel
of detail for represen~ing ~he environment
Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels
nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl
1
nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e
1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~
s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e
suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J
definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili
erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he
aforementlcned cmponents
Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained
In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a
substructure specaliZltion module and a substructure background know1edge module After a
substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized
substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the
substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve
substructures tpon receiving a new set of input examples the background know~dgt rnodJe
determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE
system runs a hie3rcllical representation of selected structures found in he env ironnent ~s
constructed within tlle background knowledge modJle
Chapter presents several experiments witll the StBDLE systec EXFements wall tne
s1bstructure discovery module indicate that the suostucture generation and substructure seiecton
processes triorm vell in guiding the search towards more interestln~ substructures Other
experiments incorporating botll the substructure background keowledge module lnd spetallutlon
module demonstate tbe performance improvement obtaned by using revlousiy earned
sulstructures In subsequent discovery tASks The input and output data for each exerlent are
isted in ApperdlI A
Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i
3
CHAPTER 2
stBSTRUCTtJRE DISCOVERY
Te major focus of t1is work IS to investigate methods for discoveing substructure concepts
in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni
and repetitive -hunks- of structure within the individual elements of a set of structured examples
Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified
by replacing all occurences of ~he substructure ith a single form that represents the
substructure The simplified descriptions may then be passed to other learning syst~ms In
lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or
to build a hierarchical interpreuticn of the uamples in terms of hell subparts
ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture
discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance
of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt
along with the components comprising a substructure [etods for geneatlng alte-1ative
substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among
these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the
substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on
2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to
describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre
s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre
dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg
the previously described t~hni~ues
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
Lawrence Bruce Holder Jr 1S Department of Computer Science
tniversity of Illinois at trbana-Champaign 1988 Robert Earl Stepp m Advisor
T~llS thesis describes a method for disltovering substructure oncepts in uamples The method
Involves a computationally constrained best-first search guieed by four heuristics cognitive
savings compactness connectivity and coverage Each heuristic s described in detail along with its
role In ealuatng an individual substructure concept The SlBOlE system that implements the
method contains a substructure discovery module a substructure specialization module and an
mcemenal substructure background knowledge module for applying previously discovered
substructure concepts The substructure background knowledge includes both user-defined and
SLBDLE-discovered substructures in a hierarchy that is used to determine which substructures art
present in the input eumples The system bas performed well on a number of eumples from
dderent domains and has discovered many interesting substructure conceU sucb as an aromatic
rrg and a macro-operator for stacking blocks The nethod and implementation of the SlBDLE
sstem are described and an analysis of experimental results is resented
iii
IJould like 0 thank my advisor Robert Stepp for ~S uoerous onrbutors tJ te etas
within this thesIs His talent for Insigbtful thinking and patient communicatlon has nade c
work both enjoyable and rewarding
I would also like to thanJc Brad Whiteball Bob Reinke Bharat Rao Diane COOk and Bob 1041
for their comments and suggestions regarding this work Tlanks also to the rest of the Artficai
Intelligence Researlh Group at the Coordinated Science Laboratory especially to my office mate
~ott Benr~ett for his patient participation in our discussions Special thanks to our s~e3r
Fancle Bridges for her ever jleasant assistance
Finally I would like to thank my family for their enthusiastic support and encouragement
brougcout my academic endeavors
This researCl was partIally supported by the ationa1 Science Foundation under grant SF
IST-85-11170 the Offe of ~aval Researcll under grant ~OOO14-82-Kmiddot0186 by Defense Advanced
Research ProJ~ts Agency under grant ~OOOl$-ampi-K-Oamp7$ and by a gIft from Texas rnstrumenu
Inc
T vlLE OF CO-TETS
CHAPTER
1 ITRODlCTIO
2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon
I
23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln
3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25
321 GeneratIon 25 322 Selection 29 323 ExamFle 32
33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37
34 Substructure Background Knowledge 39
3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13
EXPERI(E-n -16
41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59
c REL~TED VORK 62 51 Gestalt Pshoiogy 62
52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69
CH~TER
6 COCL tSIO
6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~
611 Substucture DiSCOvery and InstantIation 6
622 Background Knowledge 75 623 ApplicatIOns 79
REFERECES 51
APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93
~ 3 Experllcent 3 99
~ Etperiment J 102 AS Experiment 5 104
vi
QPTER 1
LTRODtCI10N
At any Iven moment the amount of detailed information available fr~m an envlronmel s
overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d
rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and
mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard
ettract infermation from the environment at a level of detail that is appropriate for he purpose cf
tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert
more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a
com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a
strucured enVrgtnment
Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg
nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f
hese stl1cural primitlves From the observations at different levels of detail humans na
construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be
descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts
Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he
bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~
form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel
of detail for represen~ing ~he environment
Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels
nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl
1
nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e
1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~
s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e
suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J
definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili
erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he
aforementlcned cmponents
Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained
In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a
substructure specaliZltion module and a substructure background know1edge module After a
substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized
substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the
substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve
substructures tpon receiving a new set of input examples the background know~dgt rnodJe
determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE
system runs a hie3rcllical representation of selected structures found in he env ironnent ~s
constructed within tlle background knowledge modJle
Chapter presents several experiments witll the StBDLE systec EXFements wall tne
s1bstructure discovery module indicate that the suostucture generation and substructure seiecton
processes triorm vell in guiding the search towards more interestln~ substructures Other
experiments incorporating botll the substructure background keowledge module lnd spetallutlon
module demonstate tbe performance improvement obtaned by using revlousiy earned
sulstructures In subsequent discovery tASks The input and output data for each exerlent are
isted in ApperdlI A
Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i
3
CHAPTER 2
stBSTRUCTtJRE DISCOVERY
Te major focus of t1is work IS to investigate methods for discoveing substructure concepts
in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni
and repetitive -hunks- of structure within the individual elements of a set of structured examples
Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified
by replacing all occurences of ~he substructure ith a single form that represents the
substructure The simplified descriptions may then be passed to other learning syst~ms In
lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or
to build a hierarchical interpreuticn of the uamples in terms of hell subparts
ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture
discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance
of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt
along with the components comprising a substructure [etods for geneatlng alte-1ative
substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among
these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the
substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on
2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to
describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre
s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre
dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg
the previously described t~hni~ues
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
IJould like 0 thank my advisor Robert Stepp for ~S uoerous onrbutors tJ te etas
within this thesIs His talent for Insigbtful thinking and patient communicatlon has nade c
work both enjoyable and rewarding
I would also like to thanJc Brad Whiteball Bob Reinke Bharat Rao Diane COOk and Bob 1041
for their comments and suggestions regarding this work Tlanks also to the rest of the Artficai
Intelligence Researlh Group at the Coordinated Science Laboratory especially to my office mate
~ott Benr~ett for his patient participation in our discussions Special thanks to our s~e3r
Fancle Bridges for her ever jleasant assistance
Finally I would like to thank my family for their enthusiastic support and encouragement
brougcout my academic endeavors
This researCl was partIally supported by the ationa1 Science Foundation under grant SF
IST-85-11170 the Offe of ~aval Researcll under grant ~OOO14-82-Kmiddot0186 by Defense Advanced
Research ProJ~ts Agency under grant ~OOOl$-ampi-K-Oamp7$ and by a gIft from Texas rnstrumenu
Inc
T vlLE OF CO-TETS
CHAPTER
1 ITRODlCTIO
2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon
I
23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln
3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25
321 GeneratIon 25 322 Selection 29 323 ExamFle 32
33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37
34 Substructure Background Knowledge 39
3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13
EXPERI(E-n -16
41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59
c REL~TED VORK 62 51 Gestalt Pshoiogy 62
52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69
CH~TER
6 COCL tSIO
6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~
611 Substucture DiSCOvery and InstantIation 6
622 Background Knowledge 75 623 ApplicatIOns 79
REFERECES 51
APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93
~ 3 Experllcent 3 99
~ Etperiment J 102 AS Experiment 5 104
vi
QPTER 1
LTRODtCI10N
At any Iven moment the amount of detailed information available fr~m an envlronmel s
overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d
rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and
mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard
ettract infermation from the environment at a level of detail that is appropriate for he purpose cf
tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert
more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a
com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a
strucured enVrgtnment
Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg
nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f
hese stl1cural primitlves From the observations at different levels of detail humans na
construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be
descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts
Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he
bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~
form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel
of detail for represen~ing ~he environment
Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels
nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl
1
nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e
1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~
s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e
suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J
definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili
erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he
aforementlcned cmponents
Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained
In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a
substructure specaliZltion module and a substructure background know1edge module After a
substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized
substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the
substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve
substructures tpon receiving a new set of input examples the background know~dgt rnodJe
determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE
system runs a hie3rcllical representation of selected structures found in he env ironnent ~s
constructed within tlle background knowledge modJle
Chapter presents several experiments witll the StBDLE systec EXFements wall tne
s1bstructure discovery module indicate that the suostucture generation and substructure seiecton
processes triorm vell in guiding the search towards more interestln~ substructures Other
experiments incorporating botll the substructure background keowledge module lnd spetallutlon
module demonstate tbe performance improvement obtaned by using revlousiy earned
sulstructures In subsequent discovery tASks The input and output data for each exerlent are
isted in ApperdlI A
Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i
3
CHAPTER 2
stBSTRUCTtJRE DISCOVERY
Te major focus of t1is work IS to investigate methods for discoveing substructure concepts
in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni
and repetitive -hunks- of structure within the individual elements of a set of structured examples
Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified
by replacing all occurences of ~he substructure ith a single form that represents the
substructure The simplified descriptions may then be passed to other learning syst~ms In
lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or
to build a hierarchical interpreuticn of the uamples in terms of hell subparts
ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture
discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance
of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt
along with the components comprising a substructure [etods for geneatlng alte-1ative
substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among
these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the
substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on
2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to
describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre
s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre
dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg
the previously described t~hni~ues
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
T vlLE OF CO-TETS
CHAPTER
1 ITRODlCTIO
2 SlBSTRlcrtRE DISCQER- ~ ~ 5 21 1otivations 6 22 Substructure ReprSentatlon
I
23 Substructure Gone-ation 11 2- Substructure Soeleclon 13 25 Substructure Instantiatlon 17 26 Substructure Specialization 19 21 Background Knowledge 21 2S Substructure Dlsovery ~lgorltln
3 THE SlBDLE SYSTE)( 25 31 Substructure Representation 26 32 Heurlstlc-Based Substructure Discovery 25
321 GeneratIon 25 322 Selection 29 323 ExamFle 32
33 Substructure Specialization 35 331 Specialization Algorithm 35 332 Esampie 37
34 Substructure Background Knowledge 39
3~1 ~rchltecture -10 342 Identifying Substructures in Et~nplts -13
EXPERI(E-n -16
41 Experiment 1 VarYlng the Computational limit -16 ~2 Experiment 2 Specialization and Background Knowledge 19 3 Experiment 3 DisJverln~ Classifying A~trlbues In )lultilie xamplS 53 - Experiment~ Dls~vertng Iacro-Operators in Proof Trees 56 S EXet1ment S Data Abstraction and Feature Forllatlon 59
c REL~TED VORK 62 51 Gestalt Pshoiogy 62
52 Discovenrg Grop5 f Objects In Winstgtnmiddots ARCH P-cgram 63 ~3 Cvintlve OptlrlzJt~n in Wolffs SPR Pcgrar 67 S Substucure Dtsoery In middotbitalis PL-D System 69
CH~TER
6 COCL tSIO
6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~
611 Substucture DiSCOvery and InstantIation 6
622 Background Knowledge 75 623 ApplicatIOns 79
REFERECES 51
APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93
~ 3 Experllcent 3 99
~ Etperiment J 102 AS Experiment 5 104
vi
QPTER 1
LTRODtCI10N
At any Iven moment the amount of detailed information available fr~m an envlronmel s
overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d
rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and
mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard
ettract infermation from the environment at a level of detail that is appropriate for he purpose cf
tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert
more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a
com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a
strucured enVrgtnment
Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg
nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f
hese stl1cural primitlves From the observations at different levels of detail humans na
construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be
descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts
Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he
bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~
form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel
of detail for represen~ing ~he environment
Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels
nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl
1
nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e
1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~
s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e
suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J
definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili
erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he
aforementlcned cmponents
Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained
In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a
substructure specaliZltion module and a substructure background know1edge module After a
substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized
substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the
substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve
substructures tpon receiving a new set of input examples the background know~dgt rnodJe
determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE
system runs a hie3rcllical representation of selected structures found in he env ironnent ~s
constructed within tlle background knowledge modJle
Chapter presents several experiments witll the StBDLE systec EXFements wall tne
s1bstructure discovery module indicate that the suostucture generation and substructure seiecton
processes triorm vell in guiding the search towards more interestln~ substructures Other
experiments incorporating botll the substructure background keowledge module lnd spetallutlon
module demonstate tbe performance improvement obtaned by using revlousiy earned
sulstructures In subsequent discovery tASks The input and output data for each exerlent are
isted in ApperdlI A
Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i
3
CHAPTER 2
stBSTRUCTtJRE DISCOVERY
Te major focus of t1is work IS to investigate methods for discoveing substructure concepts
in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni
and repetitive -hunks- of structure within the individual elements of a set of structured examples
Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified
by replacing all occurences of ~he substructure ith a single form that represents the
substructure The simplified descriptions may then be passed to other learning syst~ms In
lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or
to build a hierarchical interpreuticn of the uamples in terms of hell subparts
ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture
discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance
of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt
along with the components comprising a substructure [etods for geneatlng alte-1ative
substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among
these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the
substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on
2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to
describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre
s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre
dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg
the previously described t~hni~ues
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
CH~TER
6 COCL tSIO
6 1 Suoary () 2 Fmiddottue Resea-cb ~ _ ~
611 Substucture DiSCOvery and InstantIation 6
622 Background Knowledge 75 623 ApplicatIOns 79
REFERECES 51
APPEDIX A EXPERIYIET D-T~ 5amp AI Experiment 1 56 A2 Expenment 2 93
~ 3 Experllcent 3 99
~ Etperiment J 102 AS Experiment 5 104
vi
QPTER 1
LTRODtCI10N
At any Iven moment the amount of detailed information available fr~m an envlronmel s
overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d
rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and
mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard
ettract infermation from the environment at a level of detail that is appropriate for he purpose cf
tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert
more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a
com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a
strucured enVrgtnment
Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg
nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f
hese stl1cural primitlves From the observations at different levels of detail humans na
construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be
descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts
Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he
bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~
form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel
of detail for represen~ing ~he environment
Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels
nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl
1
nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e
1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~
s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e
suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J
definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili
erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he
aforementlcned cmponents
Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained
In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a
substructure specaliZltion module and a substructure background know1edge module After a
substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized
substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the
substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve
substructures tpon receiving a new set of input examples the background know~dgt rnodJe
determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE
system runs a hie3rcllical representation of selected structures found in he env ironnent ~s
constructed within tlle background knowledge modJle
Chapter presents several experiments witll the StBDLE systec EXFements wall tne
s1bstructure discovery module indicate that the suostucture generation and substructure seiecton
processes triorm vell in guiding the search towards more interestln~ substructures Other
experiments incorporating botll the substructure background keowledge module lnd spetallutlon
module demonstate tbe performance improvement obtaned by using revlousiy earned
sulstructures In subsequent discovery tASks The input and output data for each exerlent are
isted in ApperdlI A
Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i
3
CHAPTER 2
stBSTRUCTtJRE DISCOVERY
Te major focus of t1is work IS to investigate methods for discoveing substructure concepts
in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni
and repetitive -hunks- of structure within the individual elements of a set of structured examples
Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified
by replacing all occurences of ~he substructure ith a single form that represents the
substructure The simplified descriptions may then be passed to other learning syst~ms In
lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or
to build a hierarchical interpreuticn of the uamples in terms of hell subparts
ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture
discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance
of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt
along with the components comprising a substructure [etods for geneatlng alte-1ative
substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among
these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the
substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on
2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to
describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre
s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre
dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg
the previously described t~hni~ues
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
QPTER 1
LTRODtCI10N
At any Iven moment the amount of detailed information available fr~m an envlronmel s
overwhelmIng For elample dose observation of a brick wall reveals not only tbe rOINS d
rectang11ar brIckS but also tbe mortAr between tbe bricks the pitted surface of the brIcks and
mortar small cracks In tbe bricks etc Yet hUmans bave the ability to Ignore such detail ard
ettract infermation from the environment at a level of detail that is appropriate for he purpose cf
tbe ocservatlon Even n an unfamiliar environment lumans ignore intricate det1il and iscert
more abstract patterns in tle stimuli of the environment [Witkin83] This thesis describes a
com~utatonal mettod for discovermg abstract patterns or substruct~re in the descriptions 0 a
strucured enVrgtnment
Vhen obser atlons at varying levels of detail are necessary ~uruns are caable of descendrg
nto be more minute structure of the environmental stimuli and centlfymg patterns In arms f
hese stl1cural primitlves From the observations at different levels of detail humans na
construct a lle-archical description of the envirlnoent For nstance tbe brIckmiddot loll an be
descrbed as he rectangular bricks in tbe wall along with tbe interconnecons betmiddotveen tle mcilts
Furthermore each rick can be described in terms of the pitS racks and embedded graIns in he
bnck and each interconnecion can be described by the cl11pcnelts oi h~ mortar that ombme ~
form tle IntercoMecion Thus each Ievei in ~he descrption ~ierarchy rerresents I different eel
of detail for represen~ing ~he environment
Once such a bleoarchy is constructed similar prmitive struc res in diffeent middotWlCrmels
nay suggest tle Uistence of mere abstract featres higher ill n he hlerarhy Tls s~rltl
1
nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e
1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~
s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e
suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J
definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili
erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he
aforementlcned cmponents
Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained
In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a
substructure specaliZltion module and a substructure background know1edge module After a
substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized
substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the
substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve
substructures tpon receiving a new set of input examples the background know~dgt rnodJe
determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE
system runs a hie3rcllical representation of selected structures found in he env ironnent ~s
constructed within tlle background knowledge modJle
Chapter presents several experiments witll the StBDLE systec EXFements wall tne
s1bstructure discovery module indicate that the suostucture generation and substructure seiecton
processes triorm vell in guiding the search towards more interestln~ substructures Other
experiments incorporating botll the substructure background keowledge module lnd spetallutlon
module demonstate tbe performance improvement obtaned by using revlousiy earned
sulstructures In subsequent discovery tASks The input and output data for each exerlent are
isted in ApperdlI A
Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i
3
CHAPTER 2
stBSTRUCTtJRE DISCOVERY
Te major focus of t1is work IS to investigate methods for discoveing substructure concepts
in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni
and repetitive -hunks- of structure within the individual elements of a set of structured examples
Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified
by replacing all occurences of ~he substructure ith a single form that represents the
substructure The simplified descriptions may then be passed to other learning syst~ms In
lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or
to build a hierarchical interpreuticn of the uamples in terms of hell subparts
ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture
discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance
of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt
along with the components comprising a substructure [etods for geneatlng alte-1ative
substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among
these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the
substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on
2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to
describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre
s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre
dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg
the previously described t~hni~ues
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
nty reFrese1ng the s~osJc~lre ~S slmpLfrg tergra lU lttl-lt~S I- lcdr e
1ewly ciscoveed substr-cure is s~iaiized by aire~jng adcli0ral s~rmiddotJ~ re T1S )lta ~
s~=stJcture descees l ~ager nore sptcfic poton of the -If~rt set of llut ltumies --e
suostJctU baclqrunc knowledge rcludes botll user-defined and discovered subs~J
definitions These are used to identify instances of substructures tilat exist In a sIVer set of ili
erampies Chapter 2 concludes by outlining a substructure discovery algortlm Incorporatng ~he
aforementlcned cmponents
Chapter 3 describes the implementation of the substructure discovery nethodoiogy ontained
In the StmiddotBDtmiddotE system In addition to the substructure discovery module StBDLE also lcludes a
substructure specaliZltion module and a substructure background know1edge module After a
substructure is d~oveed the substructure lS specIalized and botil tile original and secaiized
substructures are stored in tle background knowlCdge Within the background iinomiddotlledge the
substructures are kept in a hierarclly tbat defines omplu substructures in terms of more primltve
substructures tpon receiving a new set of input examples the background know~dgt rnodJe
determines which of the stored suostructures are resent in the etamples ThJs as he SLBDLE
system runs a hie3rcllical representation of selected structures found in he env ironnent ~s
constructed within tlle background knowledge modJle
Chapter presents several experiments witll the StBDLE systec EXFements wall tne
s1bstructure discovery module indicate that the suostucture generation and substructure seiecton
processes triorm vell in guiding the search towards more interestln~ substructures Other
experiments incorporating botll the substructure background keowledge module lnd spetallutlon
module demonstate tbe performance improvement obtaned by using revlousiy earned
sulstructures In subsequent discovery tASks The input and output data for each exerlent are
isted in ApperdlI A
Chapter strleys -revious work -elated to suostJcure disc)ery T~lS ~orK ilS ak l be euly 1900s Imiddotlen gestalt psychologists began sucying the InderYI~g ncess5 Lmiddotec i
3
CHAPTER 2
stBSTRUCTtJRE DISCOVERY
Te major focus of t1is work IS to investigate methods for discoveing substructure concepts
in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni
and repetitive -hunks- of structure within the individual elements of a set of structured examples
Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified
by replacing all occurences of ~he substructure ith a single form that represents the
substructure The simplified descriptions may then be passed to other learning syst~ms In
lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or
to build a hierarchical interpreuticn of the uamples in terms of hell subparts
ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture
discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance
of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt
along with the components comprising a substructure [etods for geneatlng alte-1ative
substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among
these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the
substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on
2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to
describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre
s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre
dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg
the previously described t~hni~ues
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
CHAPTER 2
stBSTRUCTtJRE DISCOVERY
Te major focus of t1is work IS to investigate methods for discoveing substructure concepts
in examples Substructure dlScovery is tle process of identfYlng concepts desclbl1g n~ere~tIni
and repetitive -hunks- of structure within the individual elements of a set of structured examples
Once such 3 substructure oncept s discovered the descrIptions of the examples can be Simplified
by replacing all occurences of ~he substructure ith a single form that represents the
substructure The simplified descriptions may then be passed to other learning syst~ms In
lddition tbe discovery process nay be apFlied repetitively to urther simplify the deserlptions or
to build a hierarchical interpreuticn of the uamples in terms of hell subparts
ThIS hapter presents ~he components involved in a cgtmilItatlonal nethod for SiJostrJcture
discovery SKtion 21 disc1sses the motivations for developing suet a method anc he importance
of substructure discovery in the 5eld of artificial intelligence Seton 22 1eftnes a stbn~crWt
along with the components comprising a substructure [etods for geneatlng alte-1ative
substructures are presented 1ft Section 21 and the tK1niques used for inteHigently selecting among
these alternatives are discussed in Section 24 Once an appropriate substructure 15 discovered the
substructure is irurantUud for ncb occur-ence of the substructure in the input examples Sect~on
2 discusses substructure irstantiatl0ft ewly discovered sUOstuetures ~an be ipecialized to
describe larger substruc~ures in the input eumples Section 26 disc~sses suosrlctlre
s~aization Section 27 presents methods for using background lo~ledge in the SIoStrlctlre
dlscovery process Finllly Section 28 outlines a substluure Es-oery alsolthm lncorpcrawg
the previously described t~hni~ues
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
the observaton By tercelvmg only he appropriate Informatior lumans U~ bener abie to el-
the concepts implied by the environment
Thus the undedyu~ motivations for investigating substructure ~overy are the IblqlJlto~
hlenrchical structure in the world and 1e ease with Ntllch bumans can negotiate the henrchy
With an appropriate representation for substructures and the substJcture hierarchy a
substructure discovery sysem can perform the asks Just presented
22 Substructure Representation
A substructure is both a portion of a collection of structurally-related obects and a
cesclption of the concept represented by that portion For uample the detaied structure
CJmposlng ttle concet of a brIck is a substucure of a brck wall Tle collecion of atoms ard
bones compnslng he concept of an aromatic -ing 5 a substrucure of nany Clrgarllc ~cmpouncs
However substrucure concepts are ~ot always nteresng LPOl encountenng aJr1ck wal the
concept of a brick nay not be as interesting as tohe oncept Jf a doorway or Window T~e task of
SUls-ructure discovery is to ~nd i11lDurirtg subsluctue middotoncepts In a given 5-cicltlon d
structurally-related objects The representation of strllcured oncepts sbould be onducive to tile
wk of discovering substructures This sectior resents a grapblcal repfese~tatlon for suctufee
conceptS and a language for describing tne concepts
A coUlCtion of strJctured objects C3n be epfesentec by 1 sTaph CSlng il gnpl
representation a substJctlre is a collection of annotated nodes and eages comt=rlslng a onne~ed
slt~rlph J a larier jTaph The nodes epresenl single oects aics or eiltltols in the
Slbstuc~le lnd tlle edges -epresent the ion~lCtilrs emiddotmiddoteen ea~cns lnd ler arglrents Fr
annctated Ntb gt and o~e In~tl~ed Nltll gtft Te en odt s rnected c ~tl te a ngtce Jnd te
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
SFVbullP 6 lmiddotlgt lepresen~s a sqae )n p of sore lbJect ba~ ~as a ihaoe attrCllte bu l St
elresens a squae In ~O~ of some object that may oJr may 1ot bave a shcpe attrbute
Figure 21b Illustrates the substructure found ill the eumple shown In Figure 211 Botl e
Input example and the substructWe ue upressed in ~lle VL language Tlle expression Cor the
nput nampie n Figure 211 IS
lt[SHPE( Tl J-TRIAJG~E][SHAE(T )TRIA~GLEJ[SHAPE(TllTRIA-tGLE] [SHAPE(TJ 1TRIAlGLEJ[SHAPE( Sl l-SQLARE][SHA(S )SQtARE] [SHAPE( S3 )SQtA REJ[SHAPE( 54 )-SQtAREJ[SHAPE( R 1 J-REC ANGLE] [SHAPE Cl -CRCLE][COLOR( Tl )-~ED I[COLOR( T l-RED][COLOR(T3 )-SUE] [COLOR( T 3LtElpoundcOLOR(S11-GREEJ[COLORS)-SLtEl[COLOR(S3)aSLL1 [COLOR5- -~ED)[ON( 71S1 )TI[ONCSlRl JT](ON ClRlJT] (ON R llTJION RlT3 JTrOll(RlT 1T]CONCTS)Tl [ON(T3S3)-T][ONI T SJ )TJgt
red-I ill
green-i SI ~ ~------------~~~lt- --- OBJECT-0001
) ----Rl
1amp --- OBJECT-0002 amp-blue
I~l 154- red LshyI I
I j
blue blue
(1) I1put Example ()) SubstJcure
Fig1Jre 21 Exa~plt Substrlc~re
9
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
[nput Example Substlcture
A)
V
(b) Object Overlapping Substructure
(c) Object and Relation Overlapping Substructure
Figure 22 Disjoint and Overlappin Substructures
Other substructure evaluation heuristics are adaptations of tbe cognItive savinis to rei1ec~
~ial qualities of 3 substructure One SUdl heuristic is eOrrtpGctMSS C)mpactless measures the
densitymiddot of a substructre This is not density in tbe pbysllal sense but tte densny based on tle
lumber jf relations per nUl1ber of objects in a subsuuctlre The c~mpacress heursuc 15 1
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
factors ldentlted in bumJl ferceptJon tl1at may apply c nbst-JctJre eV l-aLCll )herl
Werthelmer39] The SlBDCE system descrbed II Cla~ter J eisCJmiddot eros substncres cy Slng e
(Jur he-rist~cs preselted n bis sfCtion along wab backgrould klcwledge to suggest pronSing
subsuIctures and substructure speialization to attach contextual nformation to newiy discovered
substructures
2S Substructure mstampl1tialioD
Once an interesting substructure is disltovered the input eumples can be recast by replactng
the obJects and relations of each occurrence of the substructure with a single entity representing
tbe abstact substructure This replacement is termed substructure illstanliatwlt After
pltrforming one substructure instantiation a substrueture discovery system may continue to
discover more abstract substructures in terms of those already instantiated in the input eumples
This section considers two methods of substrueture instantiation
One method of substructure instantiation replaees eaeb occurrence by a single object
Difficulties in using this method arise from representation roblems Although all the ob]fCts and
relations of the oecurrence ue replaced by a singe object the reghbcrng relatiOns are not
replaced Therefore some recollection of the objects involved in the neighboring reiatlons must be
maintained Also the possibility of overlappinc occurrences as described in Slaquouon 24 only
confounds the insuntiaion problem In the case of object overlap each instartation of the
overlapping oceurrenees must remember the overlapptnc components Similarly in tile case of
relation overlap not only must the overlapping objects be retainect but tbe overlapping relations as
well Retaining tbe estra object information is inconslStent ~ith ihe dea oi substructJre
instantiation using a singie new object Although single object substructure instantiation seens
intuitively promising tbe accompanying representation problems are difficult to overcome
An alternative approach to subsuueture instantiation involves replaCing eact OCCmiddotence of
the substructure with a rew elation The lr~middotlments to tlis relitlon are all beb]ecs in the
slbstructure occur--nce Ourrg instantiation all re1atons r il xcurrerce are rercved from the
17
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
3 SubStructure Generation
AI essental function r ~ny SJostructure dsovry sysel s le ge~eaton cf 1 ~--1 ~
and el1tons in tile input nallples There are 10 baslC approacle5 to tbe ge1eratlon prooi~l
bottom-up and top-down
The bottom-up approach to substructure generation be~lns with the smallest substrJctures n
the input eumples and iteratlyely expands elcb substructure Tte expansion nay be accomplished
by tItO different methods The first method minifft4i UpltUULort adds one nelgbborlng reiatlon 0
the substructure For enmple according to tbe tllee neighboring relations (presented in Section
2 2) of the occurrence lt [OSITlSt)TJ[SHAPE(Tl )TRl~~GLEJ[SHugtE(Sl)SQCAREJgt the
substruc~ure in Figure 21b would be etpanded 0 genente the following tbree substructures
lt [SHAE( OBJECT -0001 )TRIASGLE)[SH ugtE(OBJECT -0002 )SQC AREl ON( OBJECT -0001 OBJECT -OOOl) T[COlOR(OBJECT -0001 )REDlgt
lt SHAPE( OBJECT -0001TRIASGlEJ[SHAE(OBJECT-OOO1)SQCAREJ [ONe OBJECT -0001 OBJECT -0OOl1T][COlOR( OBJECT-0(02)-GREEN] gt
lt [SHAE(OBJECT-OOOl )rUANGLEJ(SHAPE(OBJpoundCT-OOO1)SQCAREJ [ON OBJECT-OOOlOBJECT-0002 1T[ON( OBJECT-00020BJECT-0003 JaT] gt
The second methOd comhifl4lLoIt XpcttSLort is ~ generalization of oinlmal etlansion in hich ~O
S1JCstJctures laVIng at least one object In common are gtmbined into one substuct ure For
example tbe substr-ctllre in Filure 210 could be generated by combining the followmg 10
suestrJures
ltSHugtE(OBJEC-0001 7RIASGLZl0N( OBJECi voolOBJECT-OOOllT] gt lt[ON( OBJECT ooolOBJEC -0002 )r[SHAPEi OBJECT-000 )-SQCAREJgt
The top-down lrroacl to substructure generatlon begIns JWith the largest Ossioie
suos cres C-le for tach nput eumple ~nc iteratively disconnects the sJostructure n~c 0
smalltr 3uostrJctJres As ith t~e npanslon approacl sub$trJctllre dlsconneclon nay be
11
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
lat nave a ~3ge nunber of reatlons and objeocts n cc~rrcn E~~lus~I aFPtcatIOl f =1
jsccnnelttion can be avoided by recovLng only tJOSf relltlons ~Jat occur t elst n ~~e I_
uacples elding suostIlres ~middotltb perhaps an ncrusea llJm=er of Occurences Lastly
dtsccnnelttion can be appbed more intelligently by removing -elatlons thlt Yleld highly corneltec
substructures Tbe tecbniq ues of finding artiadiUicn points (Reingold 77J and eta fOampItlS ~Zlbn i 1
are applicable here
Regardless of how tae metbods are applied each metbod bas advantages and disadvantages
Althougb tbe tOp-lt1own approacbes allow quicker ider~iftcation of isolated substructures hev
SUifer from high computation costs due tomiddot frequent comparisons of larger substructures Both tle
combinatlon e~ansion and cut disconnection methods are apFropriate for quickly arriving at a
larger substructure but a smaller more desirable substructure may be overlooked in the process
Also in tbe contut or building a substructure hierarchy beginning with smaller substructures LS
referTed because tbe larger substructures can then be exfCssed in terms of tbe smaller ones
linimal expansion begins with smaller substructures expands substructures along one relation
lnd thUS is more likely to discover smaller substrucures middotwltin tbe computational resource
imlts of tbe system
2~ Substructure Selection
Aier using tbe metbods of the previous sectlon to construct l set of alternatlve
substrJctJres the substructure di5lovery algonthm must coClse wbu1 of tbese substructures 0
onsider the best byOthetual substructure Th1S LS tbe task of substructure selection T1e
roposed metbod of selectlcn employs a heuristic evaluatlon fnction ~o order he set Jf Jlternatie
substructures oased on ~her heuristic quality This section presents the major heuristics that 3re
Jpplicable to substructure evaluation
The first helristlc cognitive savings is the underlyingcea ehind seVll utility lnd cata
cEnFreSS1on heurIstics enpicyed in machine iearning (linton6i WhitehallS WollfS2J CJgnaie
savu-gs mnsures tbe anJunt of cau compression JbUlnea oy lrlyini 11e suostrucJ-e to e
13
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
Jccurene FrJCl ~be C~llon obl~t syClbois within re reauons tbe oel~r~g ecs ad
eiatlons of ~be origmal OCClrre1lces may be reconstructea
T~us sngle eiatlon substtlcture ItISantlatlon ecars he lore acura te a~-oac=
replacing substructure occurrences with a stngle Clore absiract entity representmg the mgla
obj~ts and relations in the occurrence Replacing objects and relatlons throl1gb substrtue
ulstantiation reduces the complexity of the input examples and allows sub5equent d~overy of
concepts deAned In teClS of the Instantiated substructures
26 Substructure Specialization
Specializing substructures is an essential capabtlity of a substructure discovery system For
Instance suppose the system finds six vccuzences of an aromatic ring substructure within 1 se~ ~f
mput examples Three of the occurrences have an attached chlorine atom and three OCC1rrences
Ilave an attached bromine atom The discovery S)sttm may benefit by retaInIng not only the
aromatic ring substrucure but also a Clore specific aromatic ring substructure~ih an attached
atom wllose type is described by the dis1unction middotchionne or bromine- Perfmntng this
specialization step allows the substructure discovery system to take advantage of additional
information in the input examples avoid learning overly general substructure concepts and Clore
rapidly discover a specific disjunctive substructure concet T115 section resents an approach to
substructure specialization
One technique for specializing a substructure is to ~rform inductive Inference on the
extended occurrences of tbe substructure An e3tended occurrence of a substructur IS the
substructure generated by adding one nelghbonng ~elaton to lle ocurrence The set of extended
occurrences consists of the substructures obtained by upanding each occurrence llth each
neighboring relation of the occurrence Once the set of eltended occurrences is onstructed
Inductive Inference generalization techniques ((ichalsklS3a an be aFplied to the newly added
neighboring -elations and thel orresponding values Tle resultni generaLzed Ieghborlrg
relations are then appended ~O ile lrilinal sucslcue t) prccue Ielo s--ecialzec SIbstlclres
19
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
2 - Background Knowledampe
Attlough he substructure disccvery tecbrl(~1es descoed 1 the -rc5 sec~o-s
wmiddotthout prior knowledge of the domain ~he application of background knomiddotI ed~e ~a1 CIW e
discovery process along more promising paths through tbe space of alternatlve substructJres T~lS
section considers background knowledge in tbe (orm of substructure definitions that is c~ncicate
substructures that are more likely to list in tbe current application dOClaln
The ~wo major functions of tbe substructure backround knowledge are to main tain boh
lStr-denned and discovered substructures and to dete-nme which of tbest substructures etlst In a
glven set of input uamples In order ti) take muimum advantage of tbe hierarchlCal nature of
substructllres tbe background knowledge is uranaed in a hierarcby in whicb compiu
substructures are defined in terms of more primitive substructures This arrangement suggests ~n
archltKure similar to tbat of I trutb maintenance system (115) (Ooy1e 79] Prlmltie
substructures serve as the justifications for more complex substructures at a higher level in tle
hierarchy When a new substructure is ldded to the background knowledge prmitjmiddote
substructures are justified by the ObjKts and relations In tbe new substructure These r-rmltie
substructires provide iattial justificatlon for tbe new substructure Fur~herlore tbe nrs arcbltKture trovides a simple process (or determining wbicb background knowledge substructures
elst n a gIven set of input eumples This deteraination can be accomplisbed oy 5rst justifying
tbe oeiltlons It tbe leaves of the backaround knowleclge hierarcby vith tbe relatons n tbe nput
uamples TIen initiate tbe normal nf5 propagation o~ration to indicate vhich substuctiJres are
ultimately justified by the relations in tbe input eumples The substructure discoey roess
may then use these background knowledge substructures as a starting point for ieneratllg
alternative substructures
Other f Jrms of background knowledge apply to the task of substrlcure ilscovery
mebaUs PL-O systen ~Whiteha1l87] for discovering substructure In sequences or actolS ~ses
~bree levels of cackiound knowiedge to guide the discovery process PL-D~ JIsecth ee
1
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109
L - limit on comlutation time S - IbaSt sUbstru~ture51 O-l wilde (amountof30mputatlon lt L) and (SIIi 0) do
BEST-StB - bestsubstructure selected from S S - S - IBEST-SlBI 0-0 U BEST-StBI E - alternative substructures generated from BEST-StBI for each e in E do
if (not (member e 0raquo S S U leI
return D
Figure 24 SubstrUcture Discovery ~lgoritbm
Tbe nezt step in the algonthm is a loop that continuously generates new substruc~ures foom
the substructures In S until either the computational limlt L is exceeded or the set of candidate
substruc~ures S is exhausted The loop begins by selectlng the best substructure in S Here tbe
heuristics of Section 2A are employed to choose the best substructure from the llternatives in S
The acual computation perforled to comiute tlle heuristic evaluation function depends on ~he
implementation Section 322 descibes the comrutatlon used in StBDlE although otbe
methods may be used For instance the heutlStics could be weighted selected by lhe user or
selected by lhe backcround knowledae However uy substructure selecion method should
involve the four heuristics described in Section 2A Once seiected the best substructure IS stored in
BESTmiddotStB and removed from S iext if BEST-StB does lot already reside in the set 0 of
discovered substructures then BEST-SLB is added to O The substruct~re generaton methods oi
Section 23 are then used to construct a set of new substructures that are stored in E Those
subsucures in E hat have not already been conSIdered by the algorithm are 3dded to S and the
loo~ repeats When tbe loop terninates 0 contains ~le set of alscovCred subitrJt~res
23
CHAPTER 3
mE SUBDUE SYSTE~I
An implementation of he substructure discovery methodology descrIbed in the frevous
chapter is contained in the SlBDLE system Wriaenn Common LISp on a Teus InslrJments
Elplorer the SlBDlE rogram facilitates the we of the discovery aigorithm both as a
substructure concept discoverer and as a mod le in a more robust nachlne learning system In
addition to the leurlstic-oased substructure discovery module SCBDCE also Induoes a
sbstructure specialization module and a substructure background knowledge module for utiiizilg
prevlowly disc)vered substructures in SUbsequent c1isc)very tasils The substrucure bJckgrourd
krowl~ge holes both user-defined and discovered substuctures in a hierarchy and de~ernres
ovillCh of these substruc1res are resent in the lnput examples
i
SlbsrJctureLser-Supplied gt EE----31lt1 Background Knoledie ~~----Background Knowledge of
SubstructJre Speeializer-k of(
C ser-Supplied Input Exampies Substructure Discovery
Figure 31 Tlle SlSDLE System
(a) SUbstructure
[SHAPE(Tl)-TRIA-iOLE][SHAPE(Sl )-5QLARE][SHAPE( Cl )-ltIRctE] (O~(T1S1 )-TIOoi(S1Cl)-T]
(b) External Representation
I I
~ Sl
i V
~-T t -
I
~ Cl i
(c) Inte~al Represe~tation
Figure 32 Substructure Representation
21
SlS bull descrltton of substructure 0 ~ eIanded EWSLSS bull I bull lnelgbboring relaLions of the occurrences of SlSI f oreach n in do
SLB bull new substructure forned by adding n to SlB -EWSLBS bull EWSlBS U I-SLBI
return -EWSlBS
Figure 33 Substructure Elpanslon Algorithm
Figure 33 shows the substructure elpansion algorithm The new etpanded substructu-e5 ae
stored in EWSLSS initially an empty se~ Fim the set of neigbboring relations of SLB are
stored in For each neighboring relation a new substructure is formed by adding the nelghborrg
-elatlon to the orIginal substructure description SlE The newly formed subst1u~ure IS added to
the set of e1panded substructures -EWSlBS After all possible neighboring eia~lons Jave een
considered the elpansion algorithm returns EWSlSS as the set of all possible suostructes
~xanded from the original substructure
322 Selection
The heuristic-based substructure discovery module selects for orslderation those
substructlres that score bighest on the four heurlstics introduced in Section 2a ogltve savlngs
compactness connectivity and covenge These four heurlstus are used 0 order he set of
alternative substructures based on their heuristic value in the contut of the carrent set of 1~ut
ellmples With the substructures ordered irom best to worst substrlctu-e selectlon redces ~
selectlng the tim substtuctlre from the ordered list ThlS sectIon descrlbes the computalons
~n olved in the calculation of each heuristic and ho tlese resuts are commed to yeid he
eurstlc value of a substructur
29
urlent set of Input xam-llS
deftoed as the ratiO of he number of relations In t-le substrmiddotc-wre to he nunber of objects in e
substructure Lnlike ccgnltive savings the compactness of a substructure is ldependent of le
Input examples
r rtlations Sl compactnns(S) = ----shy
For each of the substrutures In Fgure - lItrelationsS) bull 4 and -objecs(S) bull 4 thus
conpactness bull 4 bull 1
The third heUristic connectivity measures the amount of extemal connection in 1he
occurrences of tbe substructure C)nnecivityS defined as the Inverse of the average number of
external connections found in all occ~rrences of rle substr1JctOre in the Input uaopies Thus be
onnelttivlty of a substructure S With Jccur1ences ace In the set of input examples E IS omputed
connectivit-Spound) = locel
Agaln consider Figure 22 Each substructure has four occurrences in he input tampe In
Figure 22a each occur-ence has one external onnection thus connectvlty bull (4 )1 bull 1 In F~ure
22b and Figure 22c the tWO innermost occur-ences both have 4 ene1ai connec~lons and te tIJO
outermost occurrences both bave 2 enernal connections for a total cf 12 uernai conlectlons
Thus connectivity (12 41 bull 13
T~e final heuristc ovenge neasures the amount c saucue 11 he r-Jt ~XJ~~es
31
ltSHAPE(TlJ 7Rl~-CLlSHAPT) TR -G[SHAE(-3~ -~~GL= SHAPET4) TRI CLE(SH Ppound(SlJ SQl RErSHPES~ bull SQC R [SHAPES3) bull SQcAREI[SHAPE( 54 bull SQCARpoundJSHAE( i 1) bull REC-CLE [SHAPECl) bull CIRCL[ONTLS) bull T]O-HT~S2 bull -][0(-531 bull 7 [ONTJ5a) -(ON(SlRIJ T[ONCRl) T[0ti7) Tl O~mUT3) bull -(ONUUT- bull rJgt
First tbe algoritbm forms tbe Stt S of base substructures Inltlal1y S bas only one eiene~~
tbe substructure denoted by lt 11 OBIECToool gt This substructure bas as many occurences as
there are objects in the input example Tbe number before a SUbstructure is the wzi14 of tlle
substructure as denned in Section 3U The Object names within the substructures le aroltrar
symbols generated by the system for eacb ewly constructed substructure
S bull i lt 11 OBJECT-0001gt f
~ext we enter tbe loop wbere tbe best SUbstructure in S is stored in BESTmiddotStB reooved from S
and inserted in the set 0 of discovered substructures ut BEST-StB is minimally expanded by
addmg OM negbboring relation to BEST-SLB in all possible ways The newly created substrJue5
ae st)red In E
Input Example Substructure
amp i 511
~I Rl I- lSi ~
52 I S I
LJ L
Figure 3-1 Simple Example
33
11 bs lampie t1e Ut substlc~ure t) gte crsldered lt~5 [SpS C3ECmiddot)J(~
1U1-ltOLEl (O~(OBIECTmiddotOOCOaJECvOO~) rj [SH PE~OBJLmiddotXc bull sot AREgt ~=e~~ l
le cest substJc~ure RprCless of the amount of acdlonai OClLJton bs SmiddotlOS~-~ ~
suesJe Fgue 3 amp) WIl be the best element in the set of disOHred sJbsrUes -e~ -~
by the algorabm
33 Sllbstructure Specialization
The substructure specialiution module In SLBDLE employs t simple teChlllq ue r
s~laizi1g a substructure ThIS technique is based on the method c~ibed in Slaquotlon 25
SLBDLE specalizes a substructure by conjoining one 3tibute relation The value of ~lle added
attribute reiatlon is a disjunction of tae values observed in tbe attrlbllte relations onneced to e
)CCJrrelces of ~he substucture To avoid over-specalization tle substructure is onJolned WIh
the disJ1nc~ive attrbute relation rpresentlng the minImal amount of spe1lalizatton 3mong ~e
i=0ssloie dsJunctive t tbute relations of tbe Slbstruc~ure ~tore specfic substJctures middota
eventually be considered afer less specific subs~ructues have been st~red in ~le ackgro~d
knowedse found in subsequent eumples and ur~he specalized Secton 331 iescibes e
s bstructure secalizatlon algorithm and Section 332 il1l1States an eumple of he speclallzaton
rocess
331 Specia1ization Algorithm
Tle substructure specialization algoritbm used oy StmiddotBDLE is snow in Fgure 3 Te
algoritnm returns all possible specializations for l given substr~cttre in the current set of ~~
elamrles
he agorithm proceeds as follows Given a sxbstrucure S witl ~ccurTIces oce n the se~
Inpu euc-les E the substrJcture specalizatiort algortttm 1 Figure 3 retrs ~he ~et 51 l
osslcie specaliutons Jf S Ttese srecialiutlons ill be colleced ll SPECSLBS at 5 rl~
eny T-te 3gOltba begms by storing in AITRIBLTES all he attrbute -eltcrs ll e
35
~7-- a -d o~ ~s~middotmiddot - ~ ~ -a S 11 stor-~ n so st-u ra loa - - d c ecg~bull ~ - --- ---- bull -vant a lt H xsor
Tllerefce - a s~bstructure nely added attrbutes ~RELCOBJ)Lmiddot~IQLE_ - ALLES] and occurrences OCC the following formula is se 0 oeasure tle
mount of specializatlon In Srpc
A-rount_of_spKlaLization( S_) = --_______ (occl
The sucst1cureJltb le smaliest lmount vf specaliutlon ill hen be stored in tbe substrlcture
acKiround knowledge along w1tb the originally discovered substrucure
332 Example
As an eIanie vf ~le substructure specialiZ3tion rocess consider Figure 36 Figlre 36a
lIusntes te same Input example of Figure 3-- Wltb the additIon of several oio attlbute
-elatons After runrllng le beuristic-based suOstrlc~ure discovery algoritbm on t~IS eaat=le he
sare substruc~ure emerges as in Figure 3-
S lt (SHAPEIOBJECToool 1nIlGLE][SHAElOBJECT-000 JSQtA~ (ON( OBJECT ooolOBJECT -0001 )7gt
ex~ be ewiy diseovered substructure is sent to the substructure Secii1ita~ion nodule
Frst all tbe attibute reLations of tbe occurrences of the substruc~ure ue stored in ~TTRIBLTES
A TTRBlTES bull [COLOR( T1 lRED] [COLOR T ~REDI [COLOR 73lBL E (COLOR T 4 lSLEJ [COLOR S t )-GRE~l COLOR( s aLlE (COLOR(SJ 1aLLE] [COLOR(S41REgt1l
~elt he Iirst Itbu~e -eltlon in ATTRIBLTES s given a middotmiddotabe bull and addec 0 he grli
s- bull lt [COLOilaquo OBJECTmiddotOOC BLCREJ ISHAS( OSJEC middotOC7~AG1 (SHAPE OB1ECT-OOOllSQLHE[C--WBEC7middotmiddot)CO~OBjEC- middotJ0C -gt
Srpee bas J OCC1rrences and 2 unique val1es In the new~y added attribute relalor b~
SPECSLBS In thIs example IS
S~ - lt[COLOiU OB1ECT-0001 )-BLCEGREE~REDJSHAPE( OBJECT-000 )TRL~iGLEl [SHAPElOB1ECT-OOOll-SQl AREllON( OB]ECf-OOOOB1ECT-0(01)rIgt
T~is specialized substructure also las J occurrerces but 3 untque values thus
amount_of _srlaquoalizatlonCS ) bull 34 The tirst specialized substructure bas a smaller amount of sprtC
specalzatlon Tlerefire only tbe tirst substructure scown in Figure 36b is added ~o ~he
substrucnre background knowledge along with the oriiinally discovered substucur
SpecIalizing the substructures discovered cy SlSDlE adds to the suostucures nrormaon
about he cmtext in which tle substructures are ikely to be found B llnlmally specalizing be
s~bstructure SLBDCE avoidS adding contutual Information that is 00 specific If tbe ceslred
substructure concept is more speciAc than that )otamed hrolgcminunal specialiuton ~eciaLzq
SImilar )ucstruc~ures In subsequent uampies will transiorm the under-consrained SUbstlJctue
nto he desired oncept There is the possibilit that even tbe ninimal amoun t of specalization
nay over-constrain a substructure SlBDCE can oecover from tbis jroblen because both be
mglnal lnd specialized substructures are retained in the background knowieage The unspIClaizec
s~bstrJcure will almiddotays be available for appiicOltlon to sUbsequent discovery tasks
3 SubstMlcture Background Kllowledge
T1e substlcure background knowledge lidule in SlBDLE nas two naoi f1lnctlcts
39
O T SHA ETRL~GLE SHAPE-SQLARE
Figure 37 Substructure Backgolnd KnowledgeEumpie
ttac~ly WO Justifications When both Justifications are supported tlle slbstrlcture node 5 aisgt
su~ported
As an eumple ecall the substructure shown in Figure 36a The backgTOlnd ~1owedge
lie3Tchy for llis substructure IS shown in Figure 3 i The question aark appearing n tlle
ueoarclly represents an object Thus the substructure containtng ~he q uestton oark s
SHoPE(X)-TRL-lCiLEl[Ol(Xyen)-Tl where the question cuIlt corresponds to the object argumet Y
Other objectS in the hierarltly are represented by tbe ictorial equvalet cf their sharNl attrIbute
est suppose eitller ~le user or StBOCE wantS to lde tte substruClue from Fgure 3a tgt
he subsaucture backgT~und knowledge hiermhy n Figure 37 The resulting Ielcby is shoJoIn
n Figure 38 T~is ~uer3T~ 5 Jbtaineci by fist rtlti1~g ~be new sustrJctiJe as l~ ~unFie and
~1
1--- ~ed v blue ~
i I shyI
I~
I
I
I
I
SH-PE-cIRCtE 0-T I iSHAPE-TRIAGtE SH PE-SQt ARE COLOR-REDatLE
Fgure 39 Three Background Knoledge $ucstroJcJres
he user or by SlBDLoE he substructure back5ound knolecge 50S incrementally ~o oenne e
new substructu-es In terms of the substructures already known
3 bull 42 IdentifyiD1 Substructures in Eumples
As des-bed i~ Se~un 2S tbe substruc~m~ ciscuvery Jlsmiddotrtll d~es r te set )i r-
~ 0 71S 1T~ SH~PE 71 )-TRIAGLEj SHAPE(SU-SQCARE [0( T~ S2-ilSHAPE( T2 ) TRIAGlE iSHAPE(S2)-SQt AREJ [O~ T3S3)-n [SHAPE(T3)-TRlAGLE] [SHAPECS3)-sQtARE1 P [O(T~S4)-T] (SHAFE(T 4)-TRIAGLE] [SHAFECS4)-SQt ARE] ___
~1[0(T1S1)-T] ~SHAPE(Tl )-TRIAGlE] [O(T2 S2 )-T] [SHAPE(T2)TRIAGlEJ [0( T3 S3)-T] [SHAFE(T3)TRIoGLE] 6 (O~(TS4)-TJ (SK-FE(T4)-TRIAGLEl I
i SH~PETRIAGlE i t
I I
I I I SHAPEmiddotSQlARE
[O-i(TlSl)-Tj SHAPECTl )-TRI-GLE] ~SHAPE(Sl -SQL ARE] [0-i(T2 52)-TJ ~SH~FE(n)-TRIA-GLE] [SHAPE(S2)-SQCAREl [0(T3S3)-T] SHAPECT3 )-TRI- GLE] [SHAPE(S3 )-SQC AREJ [O~T~S)-TJ [SHAFE(T~-TRL-iGLE] [SH-PE S4 )-SQCARE] [O(S 1R 1)-T] [O~(C1R1)-TJ [OCRlT2)-T]
Base ode Support[O=l(R 1T3)-TJ [OX(RlT4)-T]
Fig-ure 310 Substructure IdentiAcation Eumple
substuc~re backgnund knowiedge but both specialized and l1specalized subsrucles
disclvered by SLBDeE an be e~ained or use in subsequent sub$truct1re diScove-y tasks
--
lll=H Eumple - r
i III
I I H
8r- shyj
V V
Cvmcutauon Lmlt Three Bes~ SUbStr1Ht~res Dlsc~red
C Value(S)a8
L 6 a
al uelt S)-312
alueltS)-21
0 alue(S)96
middotalue(S)-11
6 I
bt
10 ~ V
- -
I
alue(S)middot31~ aue(S~-96 -- - middotalue(S)92
A I
j
aI ValueS)-312 I
I I
I bull
---
- -I
Vaiue(S)-103
rgure 41 First Etample of EXiIrinent 1
elaton Thus le secmc iuostrUtture in the 5st rOl vi Figure middotU s ltSES X laSOriS
of bac~ ~ f middote middot5middot smiddot ~S-c -fmiddot-~ cmiddot - - Al~sence lr_ 10 ecgeabull_ ~ - - rbullbullbull - e s_ e-middot ll
EIeriment 1 indicate tlat tle limit need not be set much hIgher ~lan the cesied size f ~he e$
overall substructure is indeed of that size The leurstics approprIately COl$a~n the searcb 0
consider substucres alorg a path of nreasing heuristIc value towards ~be best subsucttr r
be Input examples
2 Ex~riment 2 Specialization and Background Knowledge
The ability to re~aln newly iiscovered knowledge is beneficia to any learmng sysltem
Applying this knowledge to slmilar asks can greatly educe the amount of procssing -equired 0
~rfcrm tbe ask SLBDCE tlkes advantage of thIS idea by speiializlng discovered substructS
lld retaining both specialized and inspecaized sbstrucures In the backgf) und ~now leege
During subsequent discvery asks SLBDLE applies he KlOWn s1lcstrlctures to the c-rrent task
As more eumples from Similar domains are r)cessed nc-easingiy compln subst-Jctures Ut
discovered In terms of nore primItIve SlbstJcures 3lready known EVeTItJally SLBDLEs
acltground inomiddotledge becoQes a hierarchical -eresentatlon f he Si-~re in e oIa
Elelment 2 demonstrates SLBOCEs abmty 0 s~lllze lnc rean roevy dlsove-ed
subs~uclres ad illustrates bow ~hese substrJcures l~nt be 3iipiied to a simlllr dlsclery t3Sk
The examples for thlS experiment are drawn from ~he domain of organiC chemist Fgure
-3 shows ~be dnt example for Etptrment 2 The ltunr-e dsrces a ierivltve i e
cmround Henberzobenzene The best substrlc~lre discoveed ey SLBDLE fer this ~xJmie s
shon in Fgue J3b and the specializatlcn 0 tls Slstrulre s in Figure L3c Both of ese
discovered s~bstucure of Figure 3b evaluates to a higher value tlan tle revlolsiy slecllized
substIcture tbus tbe agortthm ~gu~s by considerIng ettenslons fom le uns~ecaitzed
rubStlcture After runnng tle algorthC1 wltb a comjutltional limit of 10 SL-BOLE prcdlces
the substructure in Figure $0 lS ~be best dscovered substructure Tbe resng secazed
substructure is sbown In Fgue -5c Again botb of these substrlctlres are added to tbe
background k~owlecge He eve- SlBOCE takes advantage of the substrlctlres already stored to
deftne be ~ew SJbsuuctlres n ezs of the substructures already known As a result tbe
background knoNledge IS extended blerarchiclllv upward to incorporate he reN subsiuctres
Ii
C 0
H- C C - Ii C1
(Sr C1 rl
C C ~
C C C-H C C- ii C C-Ii 1 i
~ Sr- C C C
H-C C-Ii H C
Ii H
H
fa) inpl1 E1Iample (b) Dlsco-ertd c S~Kia1ized Substructurt Su bstrlc~ule
13 Experimelt 3 Discoering Classifying Attributes In 1111 tiple Examples
tcst -acn emiddot MI -middotmiddott-- lSS- middot- bull smiddot_middot_middot ~ ~ MI _M e~ 1 li bull ) )~ w~ --IW _ ii - _ bullbulli~ __ 0 bullbullbull ~ ~ lIo_ ~
of tile given attr~butes A recent approach to cnceptul c1se-ng cllied goal1mented onepna
clustering [Stepp86) uses a Goal-Dependency etwork (GD) to suggest relevant attnoutesgtn
whIch to focus ile attentIon of the conceptual clusterI~ rocess
A GO direc~ the onceptual dusterng tetbnque inpienented in the CLlSTER CA
Frogram [~logensen8 7] [n CLLSTERCA the GO IS tovieC oy the user However the JSer
lay not ibmiddotays know JtCllcb Jttrtbutes or cmOtnJtlcn of atrlbutes are relevant to a specific
Froblem In this case be best substructure discovered ry SeBOCE in he gIVen Cum ies can e
added to the GO T~e suost-Icture attnbutes added 0 tle GO suggest problem-speclc features
t elp focJs the conceptual lustermg process Exper~lent 3 demonstrates how SLBDlE and
CLlSTERCA work ~ogether ~o discover concept)al Lsterrss based on rewiy dscovered
Thus far the operation oi SlBDLE has been etalned in e c)ntett of vne inpu eta~pe
SlBDlE operates on Dultple input examples in encty be Slle Clanner SlBDlE JIIJas
r~Fresents be input uamples as a gnph with sin~le rFI~ enrnpies represented as a single
)nnec~ed ~1h and ultple Input etampies re-reseled 15 J Jsconnec~ed gnpa middot nhl connect~d
subgraph component for eacll example Becalse ~ucsrI~I~es are connected raphs tle
substructures discovered ~n tlle contut of IlI1tpe ~lanples cannot onall strucure
spannng ncre han one ualple
Tle eumies or s elperllnert COnsist ci er roIs irs Ioduced ~y LJson Lasni
and later used or psclological test~ng ~fedina 71 7tese S3ne -1In5 lre used c enonsate tle
53
nuel f ~a=~les ~C =us~er1ngs havIng tte su1est descritlons Lsing this lEF JlC te GD-shy
descrlOed il [10gensel87J the two best d usterngs discovered are
Color of engine Aheels IS Blackmiddot middotWhitemiddot
W~en the eUQ-les ue given to StBOCE t~e est substrJcture found by the leurstic-baSd
subs-~c~~re discovery =odule is
lt~CAR-IE~GTH( OBJECT-OOOl SHORT~(LOAD-Nt-18ER( OBJECT-0001 ONE1 ~middotHEEL-COLOR OBJECTmiddot)001 )middotHITEgt
In lLer Aorcs t~e est substructure found s Q jlUJrr car wult while IyenMeLs QlId )~ 0lt1d By
addmg tls substuc~ure to the original GO and unnmg CL LSTER CA agam on the saQI
exa~Fles Je ~wo best lusterngs discovered lre
umber of short Clrs witJl wllte wheels and ore load S middotZeromiddot i Onemiddot Two to Four
Tle best dusterlng discovered by cttSTER CA s tie same as tJle best custermg jiscoverec
JltJlOut SlmiddotBOCE However the second best ctSeing JSeS ~he SLBOLE-diseo-ed sUOstrucure
attribute to custer be Input namples Thus according 0 the LE- t1S new usierng is oen
har ~h Jsterirg -ased on the color of ohe engine wheels Without the suggestin from SCBDCE
T-at CL_STER C~ was -lnable to discover t~e l usttrrg based )1 the suostmiddotc ll Jttiln~
suggesied ly SL3DL~ 5 ncs due to CLLSTERCAs bias cJoarcs l~ at-oJes llmiddote~ n the
StrJctlre oi the rooi tee Another system t~at works with be i1u-nal structure Jf 1 -proof ne
IS tle B~GGER system (Shav lik8a] BAGGER g~MfalitS to N by 5nding oops In the pr)of ree
that can be collapsed into one nacro-operator representing an i~eative instance of ~be operators
within the 1001 Tle PLA~D systen [Whlteha1l31] JSCS a method sialllar to SCBDLEs to discover
macro-ope-ators InvolVing loops and conditionals In observed slGiOences I)f plan steps Setton 5~
CiscJsses similrlties and differences between SLBDLE and PLoSD
Eleriment J shows bo SCBDCE an be used to find a maco-operator witbin he s~ructure
of a rooi ree Tle uamp1e for thIS experiment 1S drawn from be -blocks world- domaIn The
Jperators forths conaln are taken from [isson80] and are repeated e10w
PICKCP(c) Peconditions OTABLE(c) ctEAR(r) HADE~IPTY Add HOLDIG(c) Delete OTABLE(c) CLEAR(c) HADEIPTY
PlTDOW~(c) Preconditions HOLOIG(c) Add OTABLE(c) ctEAR(c HADElPTY Delete HOLOI~Gc)
STACK(cy) Preconditions HOLOIG(c) ctEAR(y) Add HA=iDE)(PTr O(~y) CLEARtc) Delete HOLDI-G(c) CLEAR(y)
LSTACK(cy) Preconditions HADE~IPTY CLEAR(cl O(cv) Add HOLO[G(c) CLEAR(y) Deiete HADE~IPTY ClEAR(c O(cy)
For ~his exampe sCse ~he lnItal world state s 1S shewn In Figure ~Sa anc ~ ieslec
e
51
STACK(XZ)
~ CSTACKCYZ) PICKLPOi)
PCTDOW(Y)
Figure 49 Diseovered (alto-operator
system beause the macro-operators may oltcur in s-bsequent uamples If tle di~J eed malt-oshy
vpeators are acded ~o SCBDtEs baltkground Icnowlece a uerarch-( of maltrO-(lpera~ors can be
~cnstrutec T~s hierarchy cI~llt serve as an initial joma heory fvr an EBL systex
45 Eltperiment 5 Data Abstraction and Feature Formation
EIerment combines SCBDLE with the I~OLCE system [HoaS3] to demonstrate he
cprovtnent sained in both processing time and quality )( results amiddothen the eumpies ontaln 3
arge amoun vf srlltt1rt A Common Lisp version of I-OtCE was used for llis eUr1le -unnng
on the same Tu3S Instruments Eplorer as the SLBDCE system
Figure lOa shows a pictorial representation of the ~hree Ositive and three negatve e13t1t=les
given to I~DtCE E1lth of the synbohc benzene rin~s in the uanples of Fgue middotlOa correspores
to the Ietalled desltiption of the uomilt structure oi the ~nzene r~g similar to ~he one 50a n in
the ieft side of Figure 10c The actual input specinc3ticn or te SlX umples ~ontlns J ctal of
178 relatiOns of he form [SeGE-aOND(C1C~-T or [OLoLE-aO-lDIClC Af~- 16J
5~Cr cf 3 cmiddote DLCE lJlie T5 e~=e etcrsta~ts l ~S~~s -5C C
SlEDlE ~1n rlFrove be resuls ~i gttle-- ~earllg sses lsa-g ~ -el~c
61
--
~ Discovering Groups of Objects in Winstons ARCH PrOV3Jll
Winnens ARCH program [Winstcn 75] dIScovers substJcture In ord 0 deeFe~
hlerarcc31 descripucn of a sene and descbe groups of ooeets as IndiVidual concepts The ~RCH
program searCles for tWo tpes of substrucure In tbe blocks world domain The first type
involves a seque~ce of objecs ccnneeted y a cham of similar relations The seeond ype Invo~es a
set of objeets aCl bavl a smtlar rla~lcnsbip to soce middot~rouptng oOJeet The approach used by
the ARCB program oeglrs will a coneeture procss hat searcles for occur~ences of the tJO tHes
of substlcture ext e reislon rccess ncludes ~ll e group those OCClrleces that fall
1eiow a given threshold of tbe ~oups average Tbs section discusses tile mehod by whKh ~he
ARCH program discovers 1otll YFes middot)f substructure and how ~he method compares 0 that of
SlBDLE
lmiddothen searching or sequences of gtbects tte ~RCH progrlm considers chainS 0i obet~S
cmnected oy SlPPORTEDmiddotBY or l~-FRO~T-OF reiations All slh h3lns J ith hee or t1ce
OOjetS qualify as a secpence However as illustrated n Figure 51 not all ooects In 1 sequence
~ ~ shyB
I ~ (
E G- c c=7 0 IA B CD
i Lr
(a) ( )
63
A B C 1 SlPPORTED-BY elacr ) F 2 ~1-RRrES relaton tJ F 3 --KI1)-0F reatlon E CJ ~ HAS-ROPERTY-OF ~eior ~ tEDIt1
0 1 SCPPORTED-BY rela~lon to F 2 [ARRIES relaton to F 3 A-KLn-0F relation to BRICK 4 HAS-PROPERTi -OF relaton to S[ALL
E 1 StPPORTEO-BY relation to F 2 [ARRIES rela tlon ~o F 3 A-KrD-0F relation to WEDGE 4 HAS-PROPERTY -OF reiatlon ) SlALL
The tommcm-realiotships-list contairs the four relations Ossessed by more than half the
candidates
Common-Relationshlps-Lst 1 StPPORTED-BY relation ~o F 2 -tARRIES relation to F 3 A-KI-D-OF relation to BRICK 4 HAS-PROPERTY -OF relation c ~[ED[tt
~eIt the yenrocedure euuns how typical each candidate 5 n orpan50n to te rell~iors in le
Sumber of propUiH In Intltwto
Number of propenlH n Ilnlon
vbere the intersection and union are of tle candidates reatl015 ist and he commort-repoundc~oruhis-
ist The SUits of usin~ U5 measllre to Iompare each 3ndidate lre
A compared 0 cOIlVfWnmiddotealonsrtps-ir J J bull 1 B comparee 0 cmttW11-relalionshps-iuc 4 ~ bull 1 C compared ~c comrNm-re(lnoftJhips-lisr -l J bull 1 D cora~tred ~o comnJ()1elcotshiss 3 J 5 E omp~ed ~o commcn-eianYtShps-sl J -5
65
~ted as an ott e1Or durng tbe discoe ~rcess SCBDLE JOtS 0 Sl e
ARCH program to =ore easily dIscover sbstructures Wltb different object yes dces ot see~
difficult Running both tbe sequence and common relation discovery processes nore tban once
mlgllt encoura~e the ARCH program to bulld substructure -oncepts containing oCJets AItl
difrerent types However tbe new process would stI remain de~endent On 1e t-loc~s middotNmiddotodd
domain
T~e final comFarlson of 11e two syseS Involves ~he representation used to store the
discoveredsubstrucIres T~e ARCH rogTlm Itiizes the semantic network foroalisrn Here the
subSUIcure node has a TYPICALmiddotlEYlBER link to a general destripton of be prototYFe
sbstrucure 1nd each OCC1rence IS linked 0 11e same substructure node I~h a GROLmiddotPmiddot
lErBER Also it FORl link notes ~be substructure type of tbe node sequence or commO1
roperty In SLBDLE only the substr1cure description is etalned in de backgroilrc ~oNedse
Furhenore tbe background lowled~e caintalns ~e substrucures in a ~ieachy tlat cefines
comple subs~uctures in ier1S of previously earled nore lrimitve substructures Although tbe
semantu retwork fjrmalism of the ARCH jlrogam can rpresent nlearcbicJl sntJ-es VSto
oes not nenton tbis ability uplicitly [Winstoni5J Tle substucte reFresert1tion sed 0
SCBDLTs caciigrcund knowledge is overly i~id Event~ally StBDLE nust ~e aoe to epesert
background knoAlledge other than substructures For tlllS reason StBDtE ou eneft~ frem l
more general epresentaticn such as the semantic network used y ~he ARCH pro~ran
53 COfDitive Optimization in olrs S~R Program
R~seu ~oward a comprebensive theory of 0inltve d~veloFment bas ed Vol to csbte
~hat optlmutton $ l najor underl)lng goal in blildin~ and elr~ a knoledse 5tJcrt
[Wol$] T~e res1lting kncwiedge structlres are cptuully ttfker~ fur ~~e ~e~lrec 15S
Furhellore WmiddotU -eserts Sl~ jl~a con~esslcn ~rnc~ies -j l-e-erts It )~t-I-
-(5-s)C =---shy
s
slze of e -emer sed 0 repiace or llstantllte the eieet n te lPlt ect T~ jS-s
represents the reducCn in size of the Input telt after replacmg each ~CU7ence of ~he elemen y 1
polrter to the element Dl lding this term by tbe cost S of storing the eiemeu leids a le a
inreases as the amount of data compression provided by tbe elellent ~e8ses Tberefore S~PR
seeks a grammar whose eiementS m8llnize eVe
S~PRs compression vahle IS Similar to SCBDLEs cogmlve savmgs Recall from Secti5)n
322 that SLBDLE computes tbe co~rltle savings of a substruct~re lS a function of the number
of occ-t-ences (fequency) of tbe SUOSUlcture and the size of ~he substruct-tre If the l-tmoer I)f
OC1renes of tbe substructure lS f and tbe size of be substructure s S ~hen be ognite savings
of he substrucre can be upressed as
Te-e are tmiddotIO diiferences oetlmiddoteen rbS upression for o~nitile savings and le expression 0shy
cOrrlreSS10n value First tbe size of the Ointe replacing be sbs-ttue middotocJrrences s not
conSlde-ed in the cognitive savmgs value Second the size of ~~e $Uostc1-e ~s subtraclted on
he recutlon tern In tbe cognitive savings value vhereas e size of be etnent is divded into
~be eduction erm In tbe compression value Tbese duerences sug5e5t pOSSible -ture
lmprovements to tbe cognltive savings measure
~ Substructure Oiscovery ill Uihitehalls PLAD System
Toe PLAD systell ~Whltella~~l discovers substructure n 1n obseved sequence )f acons
These s~oSt-tCUtes are ermed maco-opeators i)r nacZops PLl~D r~roJes gele3iizatlcn
69
The ~glltmiddotmiddote savIngs sed by PLA~D is omputed as
Te oniy dlference oetJeen tlis dednition iUld StBOtEmiddots is tile mOdlnc1tion n SlBDlEs
efiniion to allow for ovela~Fing substrtcurS PAXD does not allow macrops In an agenda
overlap lslng tile ognltve savIngs heuristic allows PLAD to select among competIng aIta
mac-ol ageondas and 0 select complete mops for instantiatlon n tile input stqence
Observed ~ctjon Sequence A3YXXXYXXZYXXYXXXXZ
COXTEXT 1 ogsav macro OS
100 l1 bull (x) 1125 ~t12 CY llmiddotmiddot
8 5 ~u - (~tmiddot z)middot
Instantiated Action Sequlnce lt B 112 Z ~11J Z
COlEXT 2 ogsav macro~s
20 ~l bull (~lumiddot Z)shy
Instantiated Action ~uence A B 11
Al interesting macrops discovered End
Figure 53 PLAXD Etam~le
1
OLPTER 6
COSC1USION
Complu lllerlrhies of substructWe are ubiquitous in be real Jrcrld and JS e uerle~S
of Chapter l demonstrate in less rea1istic domains as welL L1 order or an HeJige~ tltlty
Url about such an environnent the entity nust abstract over unInuresting jetJil and discover
substrJc~ue concepts ~hat allow an efficient and 1Seful representation of tse rlVlronelt T~s
teslS Fresents J ompuatlonal method for discovering substructure in examles rom suced
domains Secton 61 sumnarizes the substructure discovery theory and metodclogy CISSSeC l
this Iesis and Sec~lon 62 ctisClSSes directlons for future work n substructure cls)Ve
61 Summary
The uf70se of substructure discovery is to idetify interesing and -e~~te st1c~Jl
onceFts wnhn a structural relresetatlo~ of le enVlrcnmet SUCl a dsccvery systel s
aotivated oy ~he needs to abstract over detaJ 0 ruintaIn ~ lllenrchical descptlon of e
environment and 0 take advantage of substtucure within otter knOWledge-oases to ~educe st)lge
equlrements lnd retrieval tlmes This tbeslS Frese~ts ~he iUxgtrlnt 1cesses Jnd ~e~~oac)gJl
aleratves nvoiec In 3 computational method 01 discovering sustrucure
A substructure discovery system must gIerate alternative sucstrucJres 0 Ie clnsicered y
he discovery iTOCess Flt=~- aetbods of substructure generaton 1fe disssec nrlJ tt~ans
omblnatlon et~a~sior tmimai disconnection and cut disconnection Ttes~ ne~cs ff~r acrg
t o CilnSlons T~e etpanslon oethods gelerate luger s~cstrJcJres lJ1g S~tre
imal~~ 0nes lhiie he sonneclvn nethvds ~eler3te snilile~ S srmiddot~-e 7~nO 5
T~e SLBDLE syste s 11 =ieeltatl Cif e ~esses IOed 1 s~~s-_~
iscoer SLBDLE lotlta1s hree nocules he Jelirsl---=asec itstmiddotJcr~ dsce- _~
he-s-ased sc~very modue utlizes tile substruetre gener)lon and selecto-l pocesses ~
optioral background knowledge to Identify interesting substructiJres In tile 51ven nput eumj~
The ~ialization module modina tile substructure to apply in a more constrained ewironme
Tle backiround knoledge module e~ains both the discoveed lnd specIaliZed substrctures i
nierarchy and suggests t~ the heurlStc-based discovery mocuf llith of tne known substruci1S
aJply to ile current set of injut eUlies Etr-eriments with SLBDLE demonstrate tbe utility f
be guidance provided by the beuristus 0 direct tile search towards Dore interesting subsiructUes
and the possible applications of the SLBDLE substructure discovery system in a vanety f
domaltls
Tle ablity to discover SJbstrlctJre in a structiJred environmen~ s important 0 the task Jf
~eazli1g about the environlent For this eason sbstructiJe discoery eresens a i=port-
lass vf problems in the area of nachine learning OJeraton In real-world domains demands f
eaning rograms tile ability to abstract Jver l~necessary ietail oy idenfying in~erestllg ates
n the ewironment Empirical ev(dence I-om the SlBDlE systen deronstates the applicabll
of he conc~ts resented in tlis thesis to the task cf dlscoveng ubsttJctJre in uampes
Etpanding upon these underlying concepts may fOV lde nore nsigbt into he development v 1
S1=struct1re discove-y syrem capable of Interacl1ng ~itb a real-worc environmen~
62 Future Research
$everal extensions and aplicttlons of the substructure CISCO very method and the SCBDCE
system ~ lire furber nvtSti5ation Interesting ettensions 0 the sUoStJcture discovery mei
include te ncorporation of new ~eurstics and b3cKsrotond incmiddotJedg int) tle discovery FfOCSS
Sbull loemiddotmiddotstmiddotS llmiddotmiddotbull r bullmiddot j -lng middot Ji -- ssge - - e middotS _ W shy
reVIOUS suess of silIlLu rJe appiiauns
esear s leecec tJ lcease the scope of Ule Frocess Clrently tbere are seveal ~~es of
substrucures ~hat annot Of c1iscovered For nStance the urent dscovery algorr can
tdiscover recursive substructures substructures Wttl negation uIg lt[0- XY)-T]
lCOLORiX)IREDJraquo or sbstr~ctures with constramts on the type of structure to lJhIC- bey can
be cmnected The abilitY to discover these types of nbsuuctures will require aUlor mociin3~ions
in both the background knowledge architecture 3nd the opeations peforned by the discovery
algorlthm
Improvements in botb the scope and the rerforrlance of he substructure discoie tlocess
may arise from a better method of substrlctie instantiation C rrent letbocs do not
satisfactorily re~resent the full amount of abstractio1 provded by a substrucure siantiation
by a Single object retains no nformation about he Ily In Ilch the substructJZe middotgtwas on1~C
lith the rest of the input nample Contta5tlngly Instantiation by a single re3tlCn reseres
exernal connection information (or each obl~~ r 1e s bslc~ule An instlntatlon retlvd ~ba
Clnomes both approaches rIay be more approprate 8et~er SUOSilmiddotuc~ e instantlatlon lothods
would allow abstraction to take place aut~matically wthm he disc~ver rocess Te dscovery
process couid work at sevenl levels of abstraction y instantlatng Interreltii1t s bst-ucures Itagt
the urrent set of input exanpes In this way several aerarlIaly denred s bstr -es nay be
discovered with a single application of the discovery algcnthr1
Iany of lle sugges~ed improvements gt the ~ubstlclre discovery rocess ~eresen~
uensive rIodificatlons to tlle current nethodology However nOst of he improemellts ~rvolve
the lS o( alternatie forms of backgrolnd knowledge The ~rlsed exensiors 0 he scoery
rocess nay be simplioec y Itpropnate extensions ~c e SlbS~lutl-e ~ackgrcunc ~Necge
z~ess l ~
One the major limiting flctors n SLBDLEs ptfocane s ~he sFarse amount
knowledge applicable dlr1ng tle discovery ptgtcess Tbe prgtposed eltensions to the backgou-a
knowledge wIll signncantly improve SlBDLEs ability to learn substructure ncepts
623 Applications
Several applications appear prOClLStng for he SlSOtE s~bstrJctiJre discovery syste
SlBDLE nlgh be used to detect ex~re mOltles and subimages in a risual image organl2e and
compress arge databases or dISCover lIIterestng features In the input 0 other machine learrung
systems
One of tbe goals Jf Visual process1ng is to construct a ugb-level nar~reta~lon Jf 3n llrage
composed only of pluis b tlis ontelt SLBDlE could be Sed ~J detet epetiw e ratierns n e
eglons of a visual Image Insuntiatng he Fatter~ to 3bstract over beir cetailec reresentalon
slmplines the image ana allows other 1son processmg systems to Jork at a hlgber ie e of
abstraction
The n~ to access information fom larse databases has betooe oonlonlac In ncs
omputilg environmentS tnfortunately as ~he amount of owleege In 1 iatlbase nCrelses so
do the storage requirements and access times Lsing tbe database 15 nut StBDlE ray cisccver
repet1tive substructure in the data Tbe substnlctures can be ~d 0 compress he database ard
Impose a hierarchical eFrsentatlon Tbs reorgamzation rdles le $t)lge eql1remen~s of tte
database and decreases -etreval time for4ueres referenCing data Jit slllar sJostr cture
sstems lost current sys~tQS He unable t~ Frcess the age ~~noer cf f~es laltle 1
9
REFERE~CES
[Doyie79J
[HoaS3]
L --]~ ason
~Iicalski33bl
G F Dejong Ud it J ~[ocrey middotEIiJJ~-8Jsed ~a~1r A A~lmiddot lew Ifccriflt UQUtg 12 (Aprtl 19~6 r= IJ-176 CUso a-ellS 5 Technical ReOrt ULL-EG-S6-2203 AI Rseul Gloup CoordIatec See Laboratory Cliverslty of Illinois at Lmiddotrara-Cbanalgn)
J de Kleer An Assumpton-Based Tt1 ~lailtenance Systn Articii Inleaile1l-Ce 8 (1936) Pl 121-162
J Doye A Truth taintenance Sysem Ar(idallnleiUgfl1ue I 3 (l9~9) ~31-27
R E Fkes P E Hart and - 1 -l5son degLearnlrg and Exectng Gtneliized Robot Plant bullJrtificial Ifllel1igertee j ( 1972) 251-288
K S Ftl R C Gonzalez and C S Lee Robctics COlLlrol Sensing Vision cud InleWgertee [cGraw-Hil -ew York Xl 1987
W A Hoff R S 1ichaiskl and R E StelP I-DLCE 3 A Pcgram ior Lelrnlng Structural Descriptions from Examples Technical Reran UCCDCSshyF-83-904 Departnent of Computer Scence ~niverslty of Iliircls aa IL 1933
W KJbler Gtlsral Psyltwlogy Lvengbt Publishing Corporation ev YJrk 11947
J Lalrd P Rosenbloom arld A -ewel1 Chlnkng in Soar Te Aracmy of J
General Learung ~tecbanlsl faclune L4aTnng 1 1 (1986) l 11-16
J Larson Inductive Inference in tle Varable-Valued P-edicate LJgc Syse= L21 Iet1Odoiogy and Comluter Implemenaton Ph D TheSis Detarte~ of Cgtmputer Science Lniverslty Jf IllinOIS Lbala IL 1977
D L lec1in W O Wattenmaker and R S [ichalski middotCmst-ans Jd Preferences In lnduclve Learning An Eqerulental Study of Human lrC
~taciline Periormance Cognilive si~nce II 3 (1981) 19 299-239
R S licbalski middotPattern Reltcglon lS Rle-Gded Inducte Inf~rl IEEE ransQClioso PalrVll bull Jn4iysiscnd Iacniv IlceWgence~ Uliy 9~O) P 3 9-361shy
R S tic1alskibull - Theory and tethodol)gy of bducive Lear1ing in acra ~ LIlDrning ~n Artiftd41 Inlelligerwe bullJptroach i( S tichaiski J G Car)ce T ~1 )litchell (eel) Tioga Pubhslung Cgtmpany Palo Alto CA 1983 r-pS3shy13
R S 1ichalski and R E Ste~p leutng om Obseratlcn CICtmiddotl Clustern~ in lacnine Ll(l1fting An -r(c~ai IfIltWgence ApprXlcn R S 1iclski J G Carbonell and T1 1ited (teL Toga Publishlng cloany Palgt Alto CA 1983 11331-363
S 1inton J G Carbonell O E~zion1 C- KOblock and D R Kckkl -Acqulrng Eif~tiye Searil Control RiJles Exianaton-Bas~d Le1r~In~ n e PRODIGY Sstem Proceedirgr of riu 18 ller~~oftllf cnirte Lecr~tg Woricsnc r-line CA June ~87 tFmiddot 11-lJ3
T 1 lilell R Ktile- and S ~C1-ClceIE Et-rac-3st GeleraltZltlCl- Cnuying e dre ~r~irf 1 Jarmiddotl aS -7-50
J G Wlt= Cognme Deyec~et As Ot~=a~1 - c- - -Ldes0 Lcarrang L Bole ed) Sfrnge~ lag Hc~lbeq 19samp
~ 11 ~=nl ~ C T Zlt~ middotGrapb T)eortlcl~ te~b~cs fo Deeclrg a~d Jese -~ -l csers IEZE rcnsccions on Cam puv- J CmiddotO hr 1 9middot = ~
83
- ooeanaile indic3ng middotmiddotbe~le or not SlBOV2 stoild )rsw e a~gi~ cwecge lodule for substruc~res aplyng ~J le ~ s~t i=~ etJ=~es
DeJ IS 11
dlsove - boolean value indicatmg lether or not SlBOlE stlould peric-l e substructure discovery algorlthm on the cur-ent set of Input ullpleS Oefauits T
s~ecalize A booiean value indicating wheher or not SLBOlE should specialize ~he bes substructure round by tle substructure discovery module Defauits 11
nc-bk - boolean value cicating whether or not StBDLE should incementally add the best discovered substructure and tle sFlaquoialized substructure if requested via tbe prevlollS keyword to the backgroilnd knowledge Default is 11
race - boolean lag for to~gling the output of ~race informaton dung SLBDLTs eucuton Defaults T
Tte esuitof 3 cd to ~he rrbd116 runc~ion is a list of the substructures discovered n order
=middot0 best to lierSt Eac s-s~rmiddotJcture is accomFanied by the occurences of the slbstrucure In Je
~rut eumes Te substructures n tbe subsequent output traces appear in the f)l1owlng form
(Substructure_~ value v eZtU01s) WITH OCCLRRECES Ire4io1s reZtUw1s I
The SoStlre ~Jmber 1 xndicates that this substructure was ~he Ih substnctue discvereC
The middotalue v is the result of the lIalueltS) computation from Section 32 for this substucure
ltela01s s the list ~f ~eatlons deining le substrucure r occurenc
In Jil tile eI1lmens the deflult lIal Jes are used for the ~11ecivty ccrrxzcnesr coverage
dicve-r Jrc cce keYmiddot1words Also to conserve space oniy ~be ~~ ~est substrlctlrS dscove-ec
85
gt bull ~
)~c1 tt I
0 c5 ~ -
13 Output for First Example of Experiment 1 Limit - 6
3qll rilebullbullbullbull
anIltten limt 6 C)Mec~I~middott bull t CC IC ~tHJ bull t coveII bull ~
Wlemiddotll bull ntl diScover bull t s peea iZI bull lli nemiddotlI bull rll
S 1e~u16 middotampIe J119 13 ~Shil~ OOec~oooIItanle [)n Jbec~middot()OOS oeet-OOOUtj 1i1pe oect-0001sqvue lSnampfI obec1middot0(01)ooCrc~el [ollobec~-JOO1~bec~-JOO2tDi
-ImiddotITi OCCtitRNCES (srI pe l )trllncle] ~)n( tU 1 et) sililpe(Sl sqare snil~e1 -cce J 01(1 Lc I middot t I
(SrIfI tliinlieJ lone tZJZ-t [slIlMIisZsq-are j snilf c2~ooCrcel )l(IzCZJ-tDf CsrlC tJ)toilnclei lOIl( JsJ)tl lnilMlisJescarej snape(cJ)eertlej O1tsJ_lJ-t)fCInlfI 4 -maniud [o~( t4~4 J-t snilfls4 1sqaue1srape( (4)cCltj 0154e41-t j)l
Sllstc~e value 9 56096 (ron~bec~-OOOIobjectOOOltl sIIpeobec~middotmiddot)()()1 )-sqlue] [s~IICJbec~-0002 -cile J~ ~ 0 e~middotOOOl)blaquoOOOZ)etD
-NiTH OCCtRRESCES ()( ~U Jet] Sllilf sLOOiqe ShllMCl)ooClclt] ~n( s1c t) f (~ ZsZ)etlSlIapuZsqOuei [SlIilpe(eZ)ooClclt] 011s2t)1 r ~ Js3 )et [SlIlpeUJ )-squue sr~c3 -emle ~on(sJ~ et)) (Jr~ ~4s)et sllilpts4Slaquore ~snilptc4)ctcel ll(s4C4 -)i
S os ~c~rbullbull4 vahu 3 l0488 ~SilptCOle(OOO1 jsqlare SiIpe)OlecOOO2 acrcej on( )bec~-OOOI b1ec~middotC002 -IiTH OCCtlRlNCES 111lC S )squae) SlIape(cl )-crcle i (OQ(slcl ~tDI ~HIfI S~)-sq1larej slIilpe(e-cmle) [01l(s2c)1 DI riI~~ sJ-squareJ SIlamppe(c3)-CC] lOIl(sJc3)atDl ~r1 S4 -sqiI~e lSnile4JooCuce [OIl(S4C4)-j)
11 aCC
A14 Output for First Example of E(periment 1 Limit 10
gt svIdue liait 10)
n~ e 10 CQnnec~lv~~1 bull t ompa(~ltSs bull t coyeilI bull latmiddotbc bull ~ll C$cove bull t SpeCiIze bull 1 emiddot lI - t
Ss~c~~e6 -alllc J119~1J ~-Ipclbec~OOO8tiln~l ~ ec~-)()()~ CCic~)OOLt HiIlC ooec~middotOOOl ~clUre sapt oecmiddotOOO I-cce )tl i)et middot)()Ol ~ec~ gtco
T- OCClR~Cs middotS l~e- ~ ~~amplie J~l ~ ls 1 ~ ~ate S11~ JIe s-~ c C~t~t~ s ~ bull ~
S3S~middotc~~e middot bull e lt$009-0 -ec JOCg )~middottc~OOO
~r ~~laquo~)C01oolaquo~~i 177 OCCt~~E~CS
~ ~l S 1 ~S~lgte st s~ middot~t ~~a~ 1 cc 1 LeI ~ -=f ~s lt ~S~i~S r~1e )I~~C bullt~re J s~2 ~~ ~ _ ~J53 s~e-sJ l(~~emiddot 1~l~elc3ce ) sJ 3 ~JsJ smiddot S-lgteSJ -i_lt1e 11t1c400(1ct 1I54J
A 16 Input for Second Example ot Experiment 1
Dtfullie ( rOl ~O 03 ~04 nO$ ~06 10 03 109 n10J
conrec~ed nU (COIlaquo~ed 101 n06) ) (corzlaquoed nOI nOZ ~ conntced n02 071 (carnlaquoed 102 O3i t ~ortced OJ 03 Coneeced n03 n04)) ~corneced ln04 n09)i callnec~ed n04 nO$) cOl1nlaquoeO nO nl0) (corrtced 100 10middot ~crnlaquoed nO nOI)) oC)lIneced 101 09 t) colllleced 1109 nlOi )1)
Al7 Output for Second Example ot Etperitnent 1 Limit - 4
l~alees (onnCCi~Y bull t ~Ol ampC~ltSS bull covrllt bull elSClve~ bull S1laquoa~zr bull 1111 incmiddot bull I
5o~lCle 2 i~r 9~J55 cotnec~ed( cncOO01)eC~OOOZ~)) 11 OCCtRR~Cs
c)ltc~( OI106tJ corlaquo~~ n0102t I laquo~ed( 10210 ~) oec~ee( n02103 t) I coec e n03 101 t)I
middot O-laquo~I 10304 )) col-ectd 01 n09t ceced 04nOt
middot co-eccci ~OOmiddot conlec~ n06I)middott)1 middot~orrlaquoed( 07101 t)) eonllaquo~IOI l091t 1)) onnect~( 109n 1 OJ_t)) I
So gtstrlctlre3 VIlle 2803$ C3 c)nlaquoeC oillaquo~middotOOO )o~t1000 t corececl Ioecmiddot0001 )0laquomiddot0002 i OCCtUDlCS
coeced -01102 J corecec( 01l06j1 cteced 0610 t i corneced( 01 106 at pI COlt1 ed( 0101 ~ _t cornec~ec(O ltOZ t j
ltcceceel 0103) t] coreced 101 02)~ I cooececl 0103 t cornt1ec( n0201 t )r connecedl 06 101~t i l corlc~eel 10 0 ~t) coulaquoe 10 01 t corIecec n02I07 tgt coeceo 03 OS at j corececl 02103 )~-shy[cOlecedl -0J 0 acrcec( 02103
~-ecedt 103104 coeceC(O3Oai middot rConeced 101 lOS bull c(cedl 0308 t(t- ~uS~V9 ~ ~~~te OJ1)amp ~
89
S1~~middotC~middot~~S I~e $0 c~ecte Qee~OOOlec)oo emiddotee~ e~middot)tJ(J ~ eJoeJ a
ec~e ~laquo~OCC laquo~vOOJ lt ecr~ec~edA~becmiddotJOO1~bec bull)CO
~-- )(C~inE~CS -- - 0 - - --- -06 0 1 onreelcl-Ol -0 -01 -~ ~~eQ e~_ v c ~e( bull _C bull bullbullbull _I~ ccn e e bullbull_0 __H
ltcoreceel C3108 bull ~Ieced 0- 08 t lcor-eeecl 0~l03 ccnec~te 00middot ) cQIrec~eeC409 j crlectedllOII109J-t] ccrlec~ec( 03104-~1ccnececmiddot 0108-) [co~eceebOs 10 ] [connecedU10910)-I] [cOCec1CGlOolAOs i-t] ~conecttd a04r09
lSl~st~Jc~e6 vll~e 31 rcr1eced(obec~middotOOOob~middotOOOJ)tJ [coreced(c~middotecmiddotOOOl JbecmiddotOOO4 eO1rec~ed( )bec~middotOOOJbect0004-1 j lC0llJ1ec~ec(oolec~middotOOOobJfttmiddotOOO3 bull ~coectec lOec)Q() 1~oe-cmiddotI)()()
~1TH occtunrcS i[ rnectec( 02103 -~j lcnl1ec~tdl ~02l0 1 i conrec~ec 06~O~ ~corecec( ~O10 t conected(10 106 C)1receatO~ 08 -d con1e-ctdt 10~10l 1] [conl1ectecl 060 - conrC~tci 0010 -t [CO1Iec~ed( ()1 06 bull
[ COLrec~ecl v1 0 - CCnle-c~eC lO1 01 )t] conlec~td( 10 108 _t] conlected r003 -conrecedl 1()20middot 1t ltrC01neceel06 107 ~ cQIlJ1e-ctedl03 05 tj lCOlec~ed( ~O lCj t i c~nlectedt lv201 i-I nec~e1 1010 I ([cO1necee( 103104 j ~COIUle-c~edl OJOI) lccnnecttdlOi 101 -tj [ccn1ectedl nOtlOJ i-t conlected 10210 ~ I
l~lC)1ncceci( 05 ~09 tl conecedr03l08 )IJ conn~ec 0101 bullt conecttdl nv20J)-t confteced( lunO 1 i coreceel 0203 itl ~ cnIt~~ec~ 0409t [cr-e-cted( 01109 itl ~ CClle-ctec( lOJl04 t LCQnectedl nOJ08 )-t i
[ coecec 0 08 -1 conllececi( 040911 i ~conltcte( 0809 t] [ccnlectee( l0304t [connected( lIO1 lu8)middot CQneced 04 lO-t ~ connectd( n04l091tl ~COnlecee ~08 09 -I] ~crneced( OJ()a middott ~ Con1ecedl u3 l08 -t ~
C)Ieceel09 l10)-) ccrneced(04 1091 I[connectcil 08109 111conec~edl nOJ~04 t (conreC~ed( 1uJ108 l lt ~ -couec ell( ~03 04-] COn1ec~ed( OJ IIo)-I] lCO11ec~eal 09 11 Otj COltecedl n040j t (eonec~ee 1v4 Iv9 lcolneeec1l oa 09 ld ~conne-c~edl r0s 11 Ol-t j ~conec~ea(1J9 110 _t] eonlec~ed 040j i-lt ~conec~ed 0409) t i
SU~S~~middotc~middote Ile 9j4j4$j CC)n1eeed(ooeClOOO1jectOOO2middotj) ITH OCCtRRENCS c)~~ec~ec( ~vl ~06 middott D coecec(OlO ] Qe-cee( I01~0 J) cortec~ee 10 CJ 1])1
co~ree~edllOJI08 ~D middotcoec~ecl 03 04 t]) I cOlece=( 104109 _Pi oecec( 04 0$ )-1 DI coecee 0$ 10 _t i
middot1recee(06o -tlraquo creeee( 0 108 itJ) ~cQecee( e8 09 I_Pgt ~ ceeee( C9l1 aJ_))
Ec ~lce
Abull 19 OUtput Cor Second Example of Experiment 1 Limit 10
Betti ~Ice
i bull 10 c)nnec~vty bull Cljllcness bull t~lte bull ~
umiddotlI bull ~i dscQe~ - s~iALe bull u1 IlC bull 1
SI~ -C ~e bull ~ e SC) gt rcecl i)-ee middot0001ec()C()4 e~ece ~ e JllO 3 ~ e middotocc-a CQeee) e-c jOCJI)ecmiddotOOOJ bull cecee ~omiddoteolmiddotOOO1loec middot)oO -
1 middot)cC~RE~cS 7ece~( ~C~O ~ ~c=~~eete -0 ~O at c~~edt ~Ol~02 bullbull ~~tc o~J~ -J bull cec ~tl =C8 - ~e-ec~lO-~Oj~ cec-ec-O~CJ ~~ec~eau_~G
91
s~~middot~middoteltS1~middote 35 middotcteelmiddoteemiddot)OO e~middotOOCJ c=lce~e middotecmiddot)CO~) ccmiddot)laquoC-l bull 3ect~ec~middotOOOJoemiddotOCC4 lmiddotC(eCec)(middot)Ietmiddot)OOLe eeec e1middotC0 CC no
TrH OCCtUENCpound ~ecel( -C - l ~o~ e ~ -~-e~ec -06 -0middot e 04_middotecmiddotCC )1 -0_I04_ - - I 00- ~~~rcee(lOmiddot 08 ~~c~~middoteOO (Qnc~ec~06~O middot_c~ec-v10=t 1eet~ O~
~4 -~ -(~ -0middot ~~ ~~ ~ ~ bull t f
=ecee~ ~CllC1a cre~e 0308 a corC(eC 0middot03 at ccC(ec 0OJ~ ccecee ~l )- bull =~eC~c06umiddot a ccc03OS ~ c~nnec-edO-OS bull eceeedlnvOJ)tmiddot ~ecee JJ
~ ~cel C3 v4 a e ~ee-CI 0310$ CO 111 ecec1 0 01 a t i cected l02nOJ t ~ coec ~ec ~ J )bullbull cccec 01109 ececl l03l08a ~recteCI O-Olj collecec 020J) COllecee 020middot cII~ececi 02OJ l conlec~eCl 104109 at [cerec~ed cOl 09j ccfectec1tOJn(4)t rConlecee 03 08 (conec~ed( 0middot 108 j eonlececj 1I041l09 t COll1ec~eC( 110109 ccleced( 10304 t lc)ecedl 10308 i leoeced( 1040$ ) CQ1lIectecl0409a clnleced(1l0109 ~ COllectedtnOJ1104) t (cOlecedl 0308 i connecedl 09 110 conccec( 04109 [CltUleced( OI09i ~COlleced(nOJn04)( [corlecec 0308Ia1 (coneceo(110304 bullbullIJ conlecteei 0110t] COtUl~~(I109 10i-tJ [CORnected 1104nO-)a( [conecedl r04 n09 a i tcOlnecedl08l09 t1eonleccdllO-tl0Jt [col11ecedCt091110)-t] [eolUtclecil n04nO)t [eonectedt n04 09 )t) i
A2 Experiment 2
Eqenment 2 illlstrates the use of StBOLTs Slb$tructure 5plaquolalization lodule 3nd
saostructure background knowledge nodule Two examples from the chemical dooain ae
eserted Tbe resulting output shows the ~rformance Improvement obtained by lpplytng
previously disc ered subst-uc~ures to subsequent discovery tasks
11 Input for First Eumple of poundXperimenl 2
kXltle 1 - ~J 4 h5 I) el lt rl)12 1 2 cOl cO2 cOl c04 cO5 c06 c07 cOS lt09 ctO cll ltll c13 lt1 cIS lt16 el (1 ~ lt19 e20 -= 1 ~J ca i
S1iemiddot)()nd IlIU (douolemiddotmiddotXlna Ill lm-y~ 111)) ICrmiddoty~ --1 -)rllcrmiddot oomiddott)pe (Il2) hydol_) toH~C IIJ) )C~Qlm) (l1m~Ype (~ ) yclen i o~ )gtlaquo -5 -ycrle) mmiddottype (h6) ~ydfOitlli (I I1middotYgtlaquo cl ~ eloreJ ( amptom-y c1 elre Itlmiddot YC Or ~rle (itOIl-~y~ )r2) bromle I amptonmiddot YJe 1 odel 110m-ly i2 ode I l~ype cOL ea)on) lmmiddotype cO2) afOon) i oomiddottype c03 af)on ltoCl-Yt (c04 aflO1 e Ie cO~ i carlOll ltype i e(6) arOoll1 (ampIOm-type lcO alOonl (amptom-rype e08 aIOn I a ll ve c~) Cloon amplommiddot~Ype ul0) ar~ll) iltom-type i cU arOoD (amptom-tyt (c 12 elrlOft OlmiddotYlM e13 agtonl (ItOlmiddottype (e14) Clr~IlJ (amptom-type ic1- I afMlD Iitom-type (e6 CIOon lcn-~e ei ampflO) amp~)O-type leU) arOon) (ommiddottrpe ieL9i af)()ll (ampt)Cl-type e20 eafOoIl ltoO-type e21 cltOoni IIlC-ry e2) af~llj OIlHYpt leJj ar)(in (amplcmiddottype (c4 cuOon SlilmiddotOolld leOl 1) ) (sileOonct cO2 ell) t) SUtlllOolla (eOS Ill ~) (slIiiIC-bolld (e12 2 ~ snlemiddotbonG (e lei t 5lCemiddotOond (el2 hJJ t)(sU1le-bond c24 ilJ sile-bond (cZJ ell t) si1e-00lG (c20 l4)) siexlnd (c13 1)rl t) (sCiemiddotono lt09 t SllImiddotbonc1 OJ 6 I Sric-oonc1 (cOl c04i t) sirlemiddotOd cO2 e04) ) u1lie-bonc I cOJ 06 ~ I SIlitbonomiddot cO$ cO slncic-o10 (lt06 ~ J ~) (Slr eXlrd cO 0) ) i SIllie-oolC te01 elL Hlie-bono cOS e12 n 5ce00lt1 cl0 clmiddot4 ll~it)Ond Iell 1) 1) sll1iiemiddotboa lcl] 1 t $lIemiddotbond c4 (15)) slIc-olO ciS el i~)Onci e16 (19) 1) sllIle-bond (el c20 ~ (SIlie-bond c19 e))
slIe-Oolld (ell cJ iemiddot)OnC c1 e4) ) (d01Oolemiddotbord cOt c03) l cotlolemiddotbond cO cltJ5) j o)uliemiddotbonG c04 eO cHIebord lt06 cl01 ) dOlObiemiddotbond cOl cl11 douoiemiddot)()nc1 (lt09 cU ClIOumiddotbond e c6 d~OQeOcd el-l e11) I doyaiemiddotbone el c 0) ) dcuoemiddot)Onc1 c~ 2 cnIiemiddot)Ono eO d~I)tmiddotboro c22 c) ~JJ)
sejc cll)~ lo-te6 ~or~ i~O~-Ye ~1 Cil~X)~ do~ieccj(cl~l _t ~~omiddotmiddotf cl ~ middot~X)n ssemiddot~e lJl segtonciie2le2J ~ ~I~O~-Ye 4r~~oie olt~c3 Cl~gtOr dO~ middot)Ce c0 ~ t i--v)fCI4 Ci~)on co~iemiddot)()d(e1Jlt141_t sfl-gtord cO ~4 i)middotecOC1~c - ( 1 0 ~- ~s semiddotX) ec cbullbull i-_- VeImiddot -C1r ) - eI 21 -c0l de ~ e -Xla( lt1 c limiddot1 ~ a~Olrmiddot t C 1 Cl~ Xln wi e- )()~e( c1 I s 1 -lOncii cl Llt4 i 0 ypeolJ )ryciroie 1 tOIr itI e4 cl~bon j do I olemiddot nel c2 co J
~eI C ~ ~e CO 01middot xlIld(c15 c19 )t ~slnile orot cU~J fa 0111- YeI c -Clxln sp-X)nd(9c at )lyfec19J-cubo=DI
Substrlctlre38 -al1le ZnOOl ([sinClebo1lcl(QbectOO5S 0jecto19 5) ~clOubllmiddotOond( lbJect-OI 60 b)ec-Ol -I bull1] ~ommiddoty~obJect-Ol 6 -carbonj [sil1le-Oond(oojec~-OOlJOoec0146 )-1 [sil1le 00 ncl(gtO)ec-O1 10 )ec~-OO5amp ~ orHytl ooec~ooll nycIoir1] uom-ty~ obctOO5S -carXlI1] [ltIoubebond(JbJec~OO lO)ec~-OOOs t] amplOIlltYel00JectOOL3 -car~nl ~to1lblemiddot)On4(obtc~oo13Jbjec~-OOOl atl snlilOond(o~lec-OOO5 olec~-)()11 )_t tommiddot ~fe 0 bec~-OOO5 -carbon J Sll1lle-bond(0 biec~-0005oofCt-0001)t j (atommiddot ~YpeOOlCC~-OOOl ClrOon j))
VoTrH OCCtRlENC$ Slrlie-I)()OI cOULt de ~le-bondrc04eO~ ) [ilomy CIlmiddotCl~n lmi emiddot~nci( cOc1 Ct sliie-Oon4(c01c04 al (l~y ho)ryorole 110111- pe- cOl carboni do~iemiddotl)()nd( c01cOJ j orypeo cl0-ltagto ltIouolebond( c06cl O)t [uqeborct( cOJn6 t l~ln- type( c06 altlrXln i 5li1e i)onct( cOlc06 )1 ucentm~y cOl -lt~bol)
( sliie-bcnct( cOZc1 t i [cioubie-igtoncttc04c01 1] UOIIItYjle c01 ooCIr~ni ~slnile-bolul(e01c 11 tj Sltiemiddotoond( cOc04 ) [01T- type hi nycrOlel lom-ypet eOZ -car=onj do1lble-Oonci( cOZc05)t] ato~typtcll caXlr dou~le-)OndlcOScll ti (siie-Xlndlc05hl )alj [uoc-yltcO)-lt4rool1j $amp ie- xl ~d( c05 eO J t (a tormiddot yXl- c05 i-cirboA) I
(slliie-boac(c13brl t (dOli Ole- bord c14cl ltJ [uemty cI41Clrgto111 slnclebonc(cl0c14 ltJ S~ieoon4c13el ( t tom- ~y h5j-hydroleAj ~atom- peo cIJ -carbon] ~dollOle-Oonc(c09c13)tl on~C (10)-lt4o1 i (101l~lemiddotXI~d( c06lO) t j [sU1Clebond~c09h5)t] (ltomtypet c09 1-lt4rbol1j sti lemiddot)Cd( c06c09l~ r totr - Yjlec06 (rOot i i
(slie-oondlcl ~ t [co oiemiddotbond( cOSell aj tom-ypeo ell to(lrbon] Sltlile-bondl cllcl~ -1 Slnlle middotOonelt cOc 1 )~ (011~Yjlemiddotltlllyciroce1iitOIll-lpeo (11-carXln j clollllle-oona(cllc 16middott i tn- tCIc15)-cu)On douole-Ootd( c15c19 )t (slllei)ord( cI6h2tj [ltOIll-tYped 9J-ltubol1 J sitie bard( _16c19 1t ~l~omy(16)to(lrbot I
sliiemiddot)()nc( c13 cZ atj dOloie- Ioncii cl S 21 ) l uoctypeo c15 middot-carbor] slnile-bond(e14cU 1) slie-1gtondtc21cJt] [ )trmiddot~YXI- 4rydrolenj ~tommiddot tVpeo cJ cubonl dollblebonc~ cJOcJ )t 1 Y~ C14 -c~1gton i QU ~ie-lOn(lt 141 A t [sillie~ndlc~0h4t 11Qm-y cJO)cubon i siemiddot xInc 1 ~ ltOJ t ~itOIllyMl c 7 -cUbonJ)
Sie middotX)nd( clLZt douleOonc(cl ScZl t (UQrn- 1 Clrgtonj ( sure-~nd( d ~ cl )t) slie middotgtonc(cllc24t [itQCl-ypeo ~J )llyl1rle llom- tYf c4 middota(aroonJ do bie- ~Ic( c2c) t 1 or - eI c 15 laClrlot co Jie- gtltIad( clSeI9 t [SIile )orell l~J ) t 1~or- y c2 cuoon1 Si ie- bonc( c 19 c t 1Q1IIvfI(19 -caroor j
SIJsJetae-l7 middotampi1e lO19middot 369 r(sinll-oond(bect-OO~l~)ect-OlOOt tQm-y~l ec-Ot 41 -ltampxIn ~lle-I)()n ooec--Ol -6oect-Ol -1 tJ ~atom-tTpII ooec~middotOl -6 -ca-XI s~iiebodi oJec~-OOl Jl lJec~middotOl -bitl Lt e- gtond )ec~_014~~ Iec~oo)tJ [uom-tYf~ccmiddot)()l rydOCr 0121 oec~-middot)()s5 CUXIft e~ ie- )onG ob)ectooIobcc--OOO5)t1[ltOm-tYI obfC~-OOt J ~centar~Q] co oiemiddoti)onlb~ecmiddotOOl JJoJec~-OOOI alJ Stilbullbullcncl(bec~-oooso~~-OOll ti (tom-tyobec~-OOO$ -cagton isileoondi OO)ec~-OOOIIcct -0001 t) onmiddotOO)tCt-OOOl-ltaC~)
~o e ) ~Wlnt ubstUC~oltes
Ssmiddotc~eO -l~e III ~ ~-~y~QIec0200l orQrecobulloee sille boct oecOO5L)ec~-OZCO]1 Qn-Yf00 lecol -I -cu)on [toble-xI~c ooec-Ol-6 lbcc-ol-l )1lO~- tyjlelo O4ctmiddot)lmiddotS Cl~xI sllebonlt( Joec~-OOI3bec~ol 6i_t rItiegtonel()b~~-Ol-1o~jec~-OO5 ~ltOIr-tmiddotjIe o~ec-0O11 ~yc~len uor)middotitI oe~-OO5S -cxIrj ltIouole-Oord )bec~-OO5 lO~~-OOO$l ao- ty bec~middotOOt3bon c)Oieond~ cc~-OOtJbecOOOld sliei)oacl Oecmiddotmiddot)()()5 bect-OOl Ulc-Ye Qec~-OOO~ a1gtOO sgie- ~c )0 ee~-JOOs ccmiddotooOl ( (i~e-y~ oecmiddotOOO1 cloo
95
gteumic uri u~~ 4di uslc Ii (IIoIetltolor 1 cI-eI~~ 1 c-sr1t ~ I 0Cmiddotr~~~ r~~~
U-llClielia Ilre-cliotcal ate) CI-SIIt elf ~sedte~lie~middotei~ CI ~i ~emiddots~C en CI~bulle JcHllIombr CI) ~Ct) netmiddotclior ca ~t~eJ CJmiddotSrlt 13 ~middotl~le 1 ei e3) sre~ ( ~cmiddotSlpe lUf3) elrcL) ~ oao-II)e i ca 3 c~e J 11ee1-eolo eu3) ~IIIe itoll- ul ur) middotlilnl (utl car3) t))l
~HfExamI Tall11 (url ~r car3 ur u$) (earmiddotshlpe nil) (w~eel-ltolor uiJ (car-It1f~I ll (loldmiddotshape nIl) (lold-l1l1omOeT 11111 finfront ) l(car-shlp (carl tlln) (lIetl--=lor (carl hite) (car-shl P (cargt opentrapc2olc) (caflnctll en $ton loacsrlC (carl) Clfe) (lolcmiddotnllo=bftmiddot carZ) olle) (-heel-coior (carll whlt)urmiddotshlpe (carJ) IUee) carmiddottli ur3) loftl) loaelhlp (car3) reeanll) (oacmiddot111=)ft carJ) 011 (lfll_l-coor (ur j wrltte J
carshlpecar4) opeD-reea (urmiddotCIII~h ar4) Ihor) (OIC-SIIampM (ur4 reell iead-rIlQor ear4 ne) I 9llIetl-color (car4) If1I~e ar-shlpe ieadJ opctl-ralezOcj (~r-lcnl~n (cad) slor~) I oacimiddotsnipe tcar5) CIteJ ~OIc-IIonber car- on) heti-color (car 5) hl~e) ( in ent carl carlJ t) ( iftrOIl t ~ car2 ca~J i )
Ctfon~ fcar3 ear) t) (inon (ear4 car5) ~raquo)))
Defxollrii ilill J ~cIl Clr carJ)
I (carmiddotsnlpl lIi) IIoI1middotcolr 11) (urlI~lIl~n nill (ld-srlpe 111 (toldmiddotr~l~ nil Uftilnt ~) carmiddotshlpe iurlJ (1Ie wrtetmiddotcolor (eal) wt-ie) (carIMlpe Icar~) I-Shlll) (carnh (car2 shor~)
Ic-SlIamppe (ur2 ec~~llle (oad-nllotllor (ea ~n) (wretl-color (car nlte) lcafmiddotshape (earJ pellmiddotreelllle camiddotenl earJ) ionl) llcmiddotrlpe lcar3) eeare) olci-rutl)ft (car J) wc ( nee-color (rJ) -lIlte J
ilof1t cal ear) tJ middotlIOIl (clrl car3) t)))))
A32 Output for Experiment 3
gt lubclue inl~ JO)
Seli tlcebullbull_
m~ 30 COIiDectlvi ry- bull t compan lias bull eoveal bull t -se-olt bull IIi ciscover bull Splaquolllize bull 1111 IIICmiddot 0 bull nil
SUraquoU1IC4 -lila l1-Z97011 (carmiddotltfttl(ob~ectoool not (iold-llUl bet oiJJCC~-OOOl oo neei-coior oo~ect-OOOI Ii~iraquo
aITH OCCt11tDiCS middotcaI~rI~ car Ion [oad-A1I=bft( carl)-on) heel-coio udwilltc)middot
ear-el-I~r urJ i-snort loJdIIIlftbft(carl)-on [neel-coior iIr3)middotnlt)lcarmiddotlIltll car Z)nonj [olc-ftllmbft( carl)-onj [-heel-colotl carZmiddotht) r[carmiddote-nI~h(carJ )aihonj [old-nllmbft(car3)-oDt] [ hetl-colort car3)lfIlI~) ll[c~r- inr~tlcarZ)~honl roc-nlunOenur)-on [lIeel-colo1 carZJ~1te) ( (iIrj1ftli car3)-snon [oldftllombeT(iIr3 )-one iheel-colotl car3 Wnlt~ care-II(car)hortj (OIC-IIIlnben caN)-oftej (-lcel-CllOtl ca4Jmiddotnte) [car-rI~hur~ -snon (Olc-rJlIlbencar-j-one (wheei-coloti iI5J middotIt~
licareIUicarJJ-snOf (olc-nllombeiIr3 -on [-lIeel-cOiOtl ur3Ilt) elr-e1lth(carJ -snort [Oidllambet(carl)-onci ~ -lIcel-colo1car3)middot~)1 1 ~clr-erI~l1 carJ J-snon (Olcmiddotnllmbet( ca3)-onj ihee-cololcar3ntf)middot Cl~middot C1I~11 a-snor rOlcmiddotlIlonbcT clr -oni [hee-colOI car) ~e i ca-ei~nca~ )SlIOr (olcmiddotlIanbricar4-on (-hcelmiddotcololea41I~) (( ur-erll( caoS )-nor (olcn~nOe(ca-Jone j heel-coiof cadI 9I~D I Zcaeniilcar)nort [olcnacOe1 car~ftci ~heel-colof ca nte)
Sstrlctmiddotre6 valJe 51033 ~hetl-cOloI0iJJect-0008-hlteJ [lltmiddot)l~(-lOtmiddotOOO8-lee-OOOl clr- eI loeemiddotOOOl -i~~r~ old-numbellecmiddotOOOl -Jrc Inet-coiol o)ee-OOOLmiddot~middottc
wri OCCtilJtOiCS
101
4~imiddot~middottle t~il 1imiddot~Ire I u~IneI~C)C 4~ume 120 d imiddotu~e ie e Hi~4-~ 1 Ui~middottC tU) i 5 0 00 oL (S1J)()) 00 ~ZJ~ ~~)fe ~1 ~Z -ytCOLIttCC stlC~Uil 01 C1tmiddotlCMiCOI Iil)smiddotlCj) 01 COJ) S~O 01 ~OJ ee COJ ~4
~-YPf ~J)~Sacll l~s~acllmiddotUll COl ib) I ursacmiddot jOJ Uie) -t7t li04 =cll 1(poundmiddotI1 i04 C 5ubOp amp04 oj~) ~-t CO) lt(W~~iludolnlfl (lOS Irib) ) ~-~Yj)f ~2) sac (s~lCa~l (cO aria) 1) (stieS-Ira (iO~ Itll) t) (su)oP (Ill rQ6) ) (SUXl 0 10- i
(beore 106 1deg7 t (ojl-ryjlt (06) middotIrstack (JnsJck-Ull (~ItIO) (unstack-ull (06 ltll)) (suOop 106 i08J (subOp 0609 Cgtcore o8 109) ) (op-tyjlt 101) lIutlck) (uftstlcklrtll108 a~) t) (ucstaclr-l rtl (0 Iftf) t) (op-ry~e I i09) jlutooln) (futdOWthrt (109 Itlt) t) op-t~pe 07) Iceup) (PIC-UP-UI 107 Illd) tl (subop (a07 110) t) (op-type 10) jllaooll) (fIIdowll-ul (110 arcf) traquo)))
42 Output for Experiment 4
PUlmee--s lmt -J connec~lviry bull t compactncu bull t covrqe - t 1Stmiddot bit - 111 OISCOVtr e t specllae e nil iAc-bk - nil
Slb$trc~~n19Yamplu 446 1396 ([op-tyPC(ObeclOOO6 )epICkUp) [pickllpalltobject-0006object-0084 -~1 sOo OOIlaquo-000101ect-00(4)et [JlIJ~IC-lri2(ob)tC~middot009oblec1-OllZ)eIJ btfore(oblCtl-OO94obltCt -0006 bull ) S gt Odegbiec~ -0 156 obec-OOO 1)t i [I ~cemiddott112 oJec~middotOOO1oblectol1l)et [Op-IYjlel 0 blec~()()94 e~zS ICC s~uk lfi I( lblec~-0094lbectmiddot0064 )_r sacifli (objteOOO1objectoo14Ietj lgt ~cic1o- middotarCobec~-OOZ 1lbiec~-0064_r op-YMobtc~-OOZ I )epll tdown] [op-ypc(obec~()()()1 )e1tacamp su bo Q biec~ -OOO6cbltcmiddotOOZ1)-tl slbop( 0 b)ect-0001o biec~-OOO6JeiD)
ITH OCCLiRENCES op-~~peli04 aICkli jllCk1i-Irampolfla)et [subOpCcOljOJ)etj [Unsuct-lrc(COJlrtC-tj (betOIe oJj04 )etl
u)Q 00iOnet ~5tlC--arI2( 10lartC)et] (op-typtl iOJ)eU1Sacel [JlIJacklfIHaOJampTtlJ)-tj ($~lCk-UII c01l=iamp~et u~cionUJlIO5ampri3)-t [cptypt-iOj)eu~cowltl (op-typtl aOl )-sacamp-] [SloIbopC oo$)et (subop iOli04 at])1
O-YII 107 l_jllckupi iHC~lhlfCo7lrtcl)etj [lubop(olc06 jet j [uIJtactlrt(~Uii e~J Core-middotI06bull01 -d s )o iOOaOZ-t stld-1112~ aOZlrtljetj [op-~yPC(amp06~eunsCkll_usack-IItC06lrt)etj ~S-ckUllmiddot rlZampi= 1gt cown-uCl10acf)_t [op-tYPC(110)p~dowlll [op-tYlaOl)sackJ (subopo1lO)etj (SUbOPCOl4101 ~j ~
s~~middot~c~reZ vliJc 31918 [op-tpe(obIC~OOO6 lickllpj [pickup-ut Jbject()006raquobjlaquotoo84lj l~slckUi~ ~otc~ ~4ooJectol)etJ [bitOf ObKl()()94objlaquot()()06ie t (SUbOP(Obtctolj6ooect-OOO1 at s tic middotamp~2 0 bec~ ()()()1~bjtttollltj top-type OOect()()9 )-ulIJtack) [uuacamp--1fI Hobiec0094obtc~-0064 ] I~ampck-lrc1obltc-OOOl~bKlmiddot0084 )etl [putclowll-arz( objectool1 bjec-00$4l t1fop-type( JbectmiddotOOll pll ~co ~n I ~tYll objtc~-OOO1 )I(k [subop(objectOOO6objectooZl Jet] [subopCobJecl-0001objtttmiddot0006 bulltDI
W1TH OCCtlR rICES [~p-tyiC c04le pickupl [piCkUP-ampIli04amprtl )t [UlJtlck-rtll aO31fICet [blftCOJa0-4 )-] suboP(iOObullOl t] S~ampCI lI11tOll1F)et] [ogt-tYIIIOlJunsUCkl [utlsace-arcl iOJamprlb)etl [SICk-afll(olIfll ti )~d~WtT oSlrlb)~ [op-tY1ClcOji-putdow tl [op-tYiiOl )estac~j lsubopamp04bull(W a t lsuooP101ltl4 -
~p- ~Yj)e i07 middotllcemiddot p] fIClp-ar c07Itld)-t] [lStac~lft2(c06a~i)et rgteorll C061Igt7 1et ~SllcllrOOiOZ bullbullt~lC -l1 0UUJe tj l~i-tYj)t c06lllJucltl [uulack-ll1Hc06lrtf )tl [StiCil UI ItcOllrld-t1 10 middotmiddotamprl 10ll)-t [op--tYpt-iIO)-putdowni [optY~CO)asacltl (IUbep 107alO)-tj [SIllOlIjO~iO- a
S ~srmiddotc~rtO middota1 J911 ((Op-rj)tObltltt-0006~ajlicamp1lpl [subopOb~~-OOOIbtC~-OO9)t] middot~~S~lCl -a~iOOec~0094 ooec1middot01 at1[btfore(obect()()94bjecl0006 at] Si1x~obectOlj6QilJec~OOOI S~ICI-UI2r1ec~-OOOIJoe~tOllatj [~p-t~iI OOtctOO9I_sticki ftS~lck-ampll( )ec~middot009400)ec~middotOOoa lCit~il( IecOOO1JOc=J084 I] ~aolIIIfr-)bte-OO looJtc~middotOOoJt] -rt ))middotecmiddotOO 1 ~ _=011-7- oo~ec-OOOI SI(it SlXliMbjectOOO6ooJCCtoolL-rj s~Ooil~btctOOOlo oect-0006 )1
TH OCCtRRlCS e iv4 -gtICit S~x~ 01OJ - ~SilCampmiddotL-tiOJIic aj rgteore-c03)4 )t Sl bO jOOVI a
103
~ ~ bullbullbullbullbullbull ~I 6 bullbull ~~(9O - middot~ bullrQlt=~emiddott bull J ~_ -o cc ~ e )e)Chlo ~ ~middotiemiddot~ ~
Slqiec~cc~Ocl ~at ~~e~ec~Od a Itmiddotce lt01 1~middotimiddotmiddotCC 1-5 lec~~~c ca~rj ittc middoty~~cl ~Cl-~~ ~t~-~~middot~ 16 CiC~ bull i~C~~-middote bull lC-
bullbull~- V~eltCO carX)lmiddot~- middotmiddot)~coqcamiddotK middotmiddotmiddot-middotmiddotv)~ l middotvemiddotmiddotbullbull 4 l _ ~ ~ - bullbull ec~llemiddotxlrd(l~ 15a do~emiddotcdmiddotclS16 ~ =omiddotmiddot~emiddotxnciv9 cl0~ s emiddotjcrcmiddot09 3 ~ sriemiddot )Oc l 0 bull 1middot a ~$ ~e-lltc IOU at sremiddot bollemiddot c 16-1 Iie middot1CCmiddot d C 5 l_~~-e ~ ir~~ l~~lmiddot~~ l-~(l~l)cn [i~=--Y~ c c~~t~ i~-ve 5 middotamp0 - middotemiddotmiddot 0 acaxmiddotmiddot 1- - bull middotv9camiddot)QII CmiddotVlCl 05 a-vemiddot~rermiddot)
ltcmiddot~~middote~n~( 13 i4~ d~middotle~~-~Cl ciZ)a~ do~l~~c~~ cO-odosmiddot ~~rile Ocnd( COS e14 at s rs ie-Joc C12(13)a suc e-i)onc1l cO~ C 11t j Slftl e- ondmiddot c 14 10 )at rIrie xlnc1t1J 09 ~ ~Olr- ypeo c14-abolll atot Yec1J )carlen) [a~mmiddot ~t (1 Z)acarbon itoat tyj)tlt cII aao loaHYpeo c08JacaIonj [uon-yj)C c07 )-carbon ( tom-yC hi 0 )a~ydroler i)
I (cloable-xlnd(clJ c14Jat double-bod( el1cl Za1 douoieboud(cO-O (08)alt sirce ondl c08 14al j slnrlebonc( c clJ)al [slnl e-I)OIIC(cOc 11 at j urC1e bondl el4n 10)1 lSrle- middot)Ond( c 1 JI09 at IOIrtYl 14-carboj lOny I lJ acarbon I tCII ~Yie 11 acarbonJ it01HYpe( C11 ~-carlon itn-ryl c08 carlotl ftOIrmiddot YjItI cO )carbon [uonyh09 arYCroten))
r CO IHemiddotmiddot)Ondl C13c14 a j ~ doule-xllld( c Llcl )] doube-iloftd( CO~()S at [sille boftd( COS 14 scie-I)OIIC(c12cU)at [Slnriebond(cO7cll ~tj sIl1Clemiddotborc1 clZ~05 at [Slnlle-oondtclllr at] 0middotycI4 acuboft] toc-tyPI I lJ )-carlenj aC-tyCcl Z-culenj rype( elL -car~ tormiddottype COS aearbol1] tC-7NcO l-carbon [atoc-Y~l08 )tlydrolmiddot~)l
(~ci)lemiddotbol1c1( cOj06 atl dou le-~d( cOJlt04ja I douoemiddotbond(cOlO a SlCl- bond( C04COS 1 slilebonc( c02cOl)at (Sltlrle- )OuH cOlc06tl iSlnrlemiddot)Ord cQ404Jat LSltlr1e-xlnd( cOJhO3 )_tj on-tyeV6 acarlonj ( too- tYe cOs acaboft ( ~Yt c04 caron] ~octy~ cOJ acarbon tormiddot type cO2 -carbon] (aton-y cOL )-caronj (ltoc-ry~ h04 ah--Clten)
(co ~lemiddotmiddot)Qrdt 0s06a1[dollle-lCnd( cOl04 at j double-Knd( cOlcO ad ~sirClebond(c04cOs)al] Sli leo xl~e cOcOJ-tj [11rle- )OlC~ cOlc06 a( sllIle bondt cmiddot)4-04 )at [Slniie-bold( cOlhO3 a tolrmiddot YiJe c06 aearbon) r tllY c05 lacarboll ~UOIr-Yt c04 carlonj mmiddotrype(cOJ1carbo] ton-ryeI COZ aClbo III [ tl tyf CJ 1 iacagton tlm-flOl Iydroer)
coublemiddotbond cOjcOO at] c1ouolemiddotmiddotond( cOllt04 )- j idouo~emiddotmiddot)Oftd( cOlOZt Slrlie-bond( c04 cOs] sre-~llc(cOcO3)at S1nlie bond( cOlc06 atl Stnlemiddotlollc1 cOZOZtj [slnile-)OlId( cOlet at ton-typeo c06 acaloftj tot-y~ cOsl-carbonj (uom-YC cOoS carboni tommiddottyel cOl acarbonj cn-typeo c02-culoni tOC-7~ cOl aca~n i (uom-yc hOZ)a-ydr)len)
Sustuct-u_O Il~e a ss06~ ([amptom- y ob)ec~middotOOOl bulloroteollorUlociinej snlie-bonc(0ect-00040lec-OOOI )t1OCmiddotypi obtJOOJ -cargtOll rdoublemiddot~nc1( ojec~ooo= b ec )002 1bull I ~C-~YC oblec~middotOOOZ-car~ si~lle)Ond bl~middot0005ec~OOO2t liemiddot bond( )OectmiddotOOOJ~ 1ecmiddot0004 a i ltCr-IYgtt )O)ec~middotOOO6 r--middotgt~ie uom ~ OOlecmiddotOOlt~ -earlon eooemiddot )Ond) 0ec1)()()4)NC~middotOOO t) aHYjJC oojec~ooos aea~on ~doOblemiddotboTld( obecooo5 J ~lec~middotOOOImiddot _J middotsilliebonc1( OO)ecooomiddot ~ )ecmiddotmiddot)()()6Ja ton- tVlelOjectOOO -carbon Sliiebond( O)ect-ooo7 )oect-OOOI i 0- YC oOlec~-OOOI -eu)Qn) i
(lTrH OCCtUESCES rCoublemiddotbond( cOjcOSal [do~olemiddotbond(cO3c04 )atl tdoubie middot)Ono(cOlcO iremiddotilond( cO cCsmiddot at
Stlilbullbull middot)Oncilc02c(mt (slnlle-)Onei(cOl 06 )j SinCle-bond cuO)-t [sriie)Onc( cOl bullbulld alcn-yjJC c06~aeubon i [amptc---YNIcO$)carboft (atotll-yf co) Iuxn olrmiddotYO3 ca~Xl iltln-ylecOa(ar)on] (ltOC-t-yecOllalullon tOUHYMlt l02a~~elen lJy e aC- ~e
(Coaoiemiddot)Oncl e lJc14Jatj [douoi-onei( cl1c12iatj [eiOIl biemiddotbond( cOmiddot c08 a sIebondl c081 1J a $iie~nci( c c 13)wt (SlAlle-)Oncllc04 ell at ~JlnCleoond cl05 srie-)Oftct ell Jl a j ln~ ~ aClrxlnj rtoc-YiMI cU)-car)on1 atlm- y c acaxn eYlc 1 bull -)0 ~y~ cOS aUlCfti (atom-y~ cO)acarboft 1[atoc---~ br middot~lltlr Qr -ye- ~08 arYC~ieJ
~ cooemiddot)Onc d bull 15 tl Ldouble-30nd(clsc16 ti (dOUMmiddot)cllC( CIJ9 10 al sl~ie rodl c09c I lt Li emiddotl)Ond(c16cl at (Slnile-)Ond(clOcls iatl [slnr1bull bond cl- la~ sncle--Ilt (1 U~ IOIT-tYl eiSJ-carbon I(ltommiddoty~ cl aQrllon (atom- y~cl 0)acarJon I~Ormiddot~Yel cls bull arleni aomrylclO iaearboni ~amptom-y c09 -carlelll (uommiddotyeI hI 2 1_yttrCf uOtllCI Lalodj~fj
OlSCOereltl h ~)icwnl [0 SI)stmiddotc~middot~1S ~n 41166~ soncs
I Susmiddotc~e~ Vllue 3436344 snll-~lIc( )ectOOlJ)ojectOOJOla middot1middot~1~middot~ oectmiddotoooq ~yd)ie~ 1IC~ 7vpe ooiecmiddot001 middott---docen i snie- lOnd( obteetmiddot0016)oect0013 a ~nifmiddotbonc(OecmiddotOOI ooet 0009 11 - tI)Omiddotec~middotOO 1acagton delOole-xld( )o)ect-OOlOI oec~ middot001 - ~c-_middot y 0middot0010 camiddotlJ SIie )0 lie o~ec middotOOlJooeemiddotO()1 0 la~ (sine emiddot )Onei( )oec-OOllJO eelO() = 7] tommiddot middottpt )ecmiddot0014 -- c i~ r Yelt )ot middot001 acaxn dgt )je- )Odi )O)ec~middotXH ~~b)middotOOI 1 ~~ ye- ~olaquomiddotOOl a~alO ciOmiddot)je x-Cmiddot ~ laquo-001 J )i)t0016~1s ~texlnci oOlectmiddot(lOl 1 ~)oll a -y~ ~Olaquo )Ql$ aCl )c Siemiddot -ene ~ iJ()I gttc001 0 a HC-~middotJ)e 0 lec )() II) aC gten
~mi OCC-ilRfSCS ~S~i emiddot )O( bullbull ~ ~ ~l ~le ~ ~~~ 05 ~ye ie~ ~at~=~middote ~ ll middot~yCi~~ i eo middot~~ct 1 bull ~ bull
9
SlS middotC~~~t~ ne 3J 363JJ iee-)()nd(~blaquo~middotOO IJ ~ ec~middotOOJO aloCi-y~ ) ttmiddotOOO9 ~c~se 11 lOtt middot0013 -e~oiet slie lt1nCl( ~ bec~middotOI) 6 ~bmiddotti~middotOO I ulle middotiJorc ~otc middotXI O~(~ )00910- Vgte 0 ect-OOII -ca~r j do1biexdl) b)titmiddotool Olltcmiddotooll i OITmiddot II ~litimiddotool 0 ca~XI J ~slIilebOnQ( OOti~middotool Jobtitool0)-t slnle bond )]ec~JOll )OfC-OOtZ lt (atoa-Yl0bec-014 lve~ie omtYj)I 0 otit-OOl bullca~n co~bifbolld(ootimiddotCOl~Iec~ middot00151 ~aotnmiddott~ Jti~middotOOl J -I=bon I doli bemiddotbonc1( 0 ))ectOOIJ)0)ecOOI6 d srr1e)()lClt ~beCmiddotooU ob)ec~middotOOI4tl ampornmiddot ~ype( ~ btt~middotOOlS ca~rI Sir-I itmiddotboado oJect-0015 ooect-0016 tj lltorn-rype( 0 0Ject0016)-Cl~II)1
I Sulst~~clreO vIlut 1 I[ atommiddot ye(coecmiddotOOJO)rolrnecoloTlreocinci sllce oord1 olaquomiddotOOIJ QJec ~OO30)shyamptomtype ~JtiOOO9 hydoien uommiddot~rpeooo)ect-001S hcoceni LSlAile bonc1~ooJecmiddot0016 jttmiddotOO18 sil1lie)odlooec-OOllobec~OOO9 ~ 1torH)middotpeo ob~ec-OOll)-ca~n do-u~middotbonc(oolaquot0010Q0ectmiddot0011- tom-r-rll ollecool0)-car~n 1snte-onltl( ooec-OOI Joblect-0010)-t~ SUllltborci(bjec-ooll oojttmiddotool - j lltor~middotP Qojec001 4 lydoaen IltOClmiddotY)e olJec-oot-cabor] [doublemiddotbocci oo)ecmiddotOOl OOjCC-OOU t1 lCllT-typto oecmiddotOOt J -ca)o 1 do olemiddot bol1(~ 00lecooUloectmiddot0016 al (SlIllie bonc( 00 ectmiddotOO150 o tit middot001J -t om~yeoobect0015 ~ -campOon s~ilemiddotbonltl( oOJectoo15ooecoo1 0)- rltommiddotye- coect-0016 -c~rlcn)1
Addina substructures 0 SKbullbullbull
O$covered S-uOstJcue5 vll J4J6Ja4 l[Silmiddotondl OOectoolloblectooJOI_t ~tom-~y~ jecCOO9 ~ydofe on- type( oectOOUeilltilen (unlle-bonli(3oICOO16ob1lCt-OOUalj slnclebondl ooecmiddot)()l ooecOOO9 Je atolTmiddot I oOtt~-OO1 t)-car)on couolemiddot)Ond( obec~middotOOlO~ojec-OOll a] (Itor-ioojecmiddotool 0 -(~~Ior j 11C )Ond( obtimiddotoolJ IoecmiddotOOl 0 t [slnleOond ooecoollooec)()l t [uorrmiddotYII oecmiddotOO J nvceiel a ntyll 0)laquo-001 eeaIOn] dOubt)oIc1(00Iectoolt~oilC0015 it (I tOITtype4 ooectmiddotoo J e(a~~o j co~oie- OoQcU OlICmiddotCOI JoliecOO16~t sil1le-Xlne(ol 0laquo-001~oect-0014 amp omiddotpe( ~ oecmiddotQ015 gton I ni lemiddot )ond( 0 bec~ooI~blec~middot )016 )t tommiddotrypeo 0 bect-OO 16)camprOcnj)1
5geceO SuOs~rJc--reO -Ie 1 ~[uom-YI ooec-gtC30 lororneciorireodilei siebond(oOjec~OOI3~i))ecOOJO-tj at)Ir-~~middot cbmiddotec~middot)()09 nyd~Ietl amp~- ~7pt1~ )ec~-OO 3 - ~oiei sncle-bo1Ii(cbIC~-OO16~oec~-001 lati (unlie-bonc1(Iojtit-OO 1 blecQ009 ~~ ~ t~Irmiddot~Yt ooecOO 1 -cubo cOuoie-bondl ob)ectmiddot0Q10 3oIC-OOt 1)-t [Itomtyp Oec~middot0010 -arboll ~Slrlle-Iolc eolaquotmiddotOO 1 J~ ~ect-OO1 OJ- sinc1e-oldtooectOOl1ooect00tt] 1 tom-ype COec~()o14hyc1r~cci amptonmiddottype lOec~middotOO I -caIOn j eomiddotolebollc1ob)ecmiddotoo12~olecOOl ) tj [amptom-YllOolec~middot0013 eCamprlOl [ciooc boncSt lec~middotvOJ J tt-OO t 6 lniie-bol1~lbec-OO15 co~t-OOUatJ [aUlmyp olOec-OOt5 Camp1101 Slnimiddot bend )ott~middotOO ~)ecgtOO t 6 bull lt tom--rypeo 0 oJect-0016 -ca)oll j) I
A3 Experimeut 3
Experiment 3 denonst-ltes SlBDCEs abiiity to iisover ost1ct1res at an oe usee 15
hlgb-leei attributes for dassifYlng mUlti]le uamples Tile input examples cr5ist If le
desCiptions of ten tlms The DefExample calls ~or each eumple ue shCmiddotlwn airg ~h 1e
99
sri~ el c2~ c~)C cl cJ QOe c 4 ~) S~if de middotli~ Jo ~ dot c~ ~6 ~
5lile cmiddotcS)t ldo)e middotc9 ~ dolloe (eIOHSlf c9cl ~if ~IO~ COmiddot)I 1 c 5rieclleI4 ) ~lIilceU)~) dooeeI4clO$ilif (5- Sle c16d bull cr~le cl- ciS) siecie c(H20) ~silee c2 c2l sie 5 e bull sI1 9 cO Strtlcmiddot cO cl ~ ~ se c c ~) (SJ~ie cO c~J) ~ sU~ir cO ~J s~ic 1 e2S ~
1~Ie c2 lt6 ~u)
QefXIple OlOIId 6 ei1tIV) ((el c2 c3 e4 lt5 co c c 9 clO ell c12 e13 c14 cl$ cl6 cl1 ciS cl9 cO cl 2 cJ ct c5 ce 27 c c9 dO 31 32
c3J eJ41 (( mlle lil) (doub lUI)) (( SIlllie ct e2) t) (douole (cl eJ) t )(dollble (e2 c4) t) (Slle (cJ c~) tHsinele (e4 lt6) ) (double d c6 1)
lt sUlIe (e7 (8) t) (double c c9) t) (doullie (c8 cl0) t)( sille d cll) t)(sule cl 0 cl2) t) dolO bie (ell c12) ) (sInile (elJ (14) ~) idolOble (clJ cl5) t) (double (cl4 (16) t) (slnrie el5 (11) t) (SllIlie e16 (15) tl (dou bie (c17 cI) t (IInlle i cl9 (20) t) (double (el9 c2l) ) (doubl (c20 el) t)( $Inle (ell c2J) t) sIlle lC (24)) ~01lbi cJ e41 tJ (111111 (6 (26) t (sInlle Icl1 c11 ~) (Sl1II middotct c2S) t Slllie le4 e29J sltle c25 c26 ~ ) (sinllbullc26 eZ) ( (sinilbull co7 cS slIllie c2 c29 ~J
SUllie u29 clO ) S111 Ie (c26 eJI) ) (siftei c21 cJ2) t)( unlie l cS cJ I ) sanlle c29 cJ4) ~))
A52 Output for Experiment S
gt (sulxh bullbull lmu )
Plauers ~imit bull 1 cOl1lec~ij~y bull t eon pacress bull =Ivee t Semiddotbe a Ill discoVft a t si=1 lze bull nl ll2C-bk e lll
RUMinl sulntucture discovey
DiscoveTee ~he (ollowl11 7 sullstrllctures m l5UJJJJ seconltl
Sustllctlre- Talut 154007 (illlle(obK-0O11jec~middotOOI $)) ~co1bil OOecmiddotOOlLbecOOO9 )( douole(obectmiddotOOIIbec~-OOOl)] SIneieioblec~-OOOjobJectOOO9 Jet (cioulllrOClJecOOO$ojectOOOl jt SUIeI 0 bjecOOOl 0 lIjec~middotOOO2 iatD~
WITH OCCtRRENCS ([sillilel e4eo)at] [d01loieic5c6-t] (dOllble(ce4Jt (uic( cJd)t i [dcule(clJ) lSI~ljel de t I) ([SUiie(c1 4e stJ dolloelclJclJ)t) (double(c1114t] SII111eicl c1 ~ at [doubilcl Oell ~ [sinl1 (1 Oe I _t])1 ([Silllle(c4c6at] [doubiel cJ(6)t] (double(c2c4Jatj (sinee( eJcJati [~ulelc1cJ )t SIIIII e le21at j) I (SIIie(clOCI1)t] dCllblecllc12Jat] double(celO)t [sielie c9cll )j [dOltilci9 Itj ~snmiddotllc cSlt 11
( [Sillilel clc2)a~] double c20(21)t) (dOble(el9 cl at] [smi lei cl S c20)t ~~oubil (1 ~e I t (Sleeil c 1 - IlJ~ 111 c21cS)-t) d01ble(c26c21)atl (double(clJe21t] ~Sillrle(e4c26)a( lioulIecJe2 t [slel(1 cJ j gt l(mi1 c4e6middott j (doublel cJc1it [double(eC4Jt] [Siftti eJcJt doul11 c1cJ)t rSlnrl1 cle2t] I ( siriilcI0e12tj ~lioublel el1ell)atj [doubllaquocSel0)tj [SlAI1 dcll a~i ~doubleic l_tj snl1 c8 l( I
(SIllil c16c18 at [doubie(c17eU)ti [doUblttc14c16t1 (Sllle (Ud -t dcuOiecIJc 15Jat $1pound111 C13 I a))1 ~SlCIechl9t) [doublelc21clt)-tJ[doubltCcJ6cJS ~atHsanlle(cl$clmiddot let cgtublecl4clj)e ~ SIrlel c2l 6 iIi laquo(SIrCieeJ4eJ5)at) dolbl cJ3eJ5)1 [cloubl cJZeJ4ltj [sftt1e(eJleJ3)t [dou~il cJOcJ I j [sinli cJOcll at j)) C(SilIlie(c4()e4UtJ tdoubleleJ9(41)t] [douollaquocJlC40JtJ [sanlie eJ- eJ9)t [coubiMl6 eJ~ ~Siit cJ6JS bull j I ([silrle(e4c6) (doullle(c5e6 )tJ [double(elc4 )at] [SiAliC( eJcSiat [doule(clcJ tj Uftlle(C lelI]) laquo(SUlltCcl0el~-t [doublcllell)) (douole(cc10)tj [su1t1e(delltj dobiee~9)atJ (siellel cc 11 GsUI c4c6)at (doullle(c5c6)tj (dc~ I1e(eJ4t] Sil1lie(eJc5 at idcublec lcJ J~ Snllel clC)t (~SU~lie(cl0elljtJ [dgtubil clle12t doUoleceIO)t] [Slnli~c9CII t] tcCOilc 91e r Silel c~cS J t I
[SIlIe(cI6c18 tJ [doubleC I ~ c IS )t [dOlOlIle( c14e16 tJ ISI1IIec1 c1 lt (lt1 c l3e j ~ [S(tlle c 1l~ 1J Jgt
~sirljl c4~ I [douoleicj~6~ tl (doublc(cJC4)t (Slnlle( cJc5 t [doUOIe(c1cJ )~ snrleelc2~1 CSlnlc( cl0elt dcuoll cl1c1t [doubie(cclO)tj (Sieie(d cIUat rco~Iec19 Itj [SIilte~d i (([SIl~Ic( cl6cl 5t) doubil el ~e1 Stl [dollllle(e14c16)t ~snllec15cl- lt ~doubjei cl J cl5) SIAC1 eU I sa]) i~ Sl1ic(co24) dbil cJel4atj [doublelcZOcl~tJ lSll1leI ellcJ t oloil cl9cll J-tl lSinitmiddot ~ 90
Suon~lIclue-6 vlluc bull 6602 rclougtle(obect-OOUobj~()()OltIJ la d)~l1 )bec()()IIobec~OOO2t mieI oOJectmiddotOOO5)o~-OOO9 atJ eoUOlelcbJfC~-OOO~obtC-)()()1 )t (SJriCI)bec~voolooecOOO bull J
ITH OCCtRRESCS ~d)Ult dc6 t eou)it et S-ilel cJd -~ [eou blr ClJ- sneI el c l middot
~~cIiCldeo ld domiddot~ r c1eJ se cJ 6t l-O6 i c4 sniCI cLbull i(~COl )1 c4 bull( emiddot~et 3 ~ S~i~~ (4(6 Jt COtl 11 eJ o_t s~ie eJ~ I
10$
bull 11 _ L 4 ~ bull 1 -J - bull )~_e 1 bull1 c ou~tle_ bull l_ie-e bull e bullbull lOUliel-c5lt= SIeI4C~ douIelC2C4 bull t HiC e2]) shyOUlleclcJlat ~jlie(e4c6) doubicc~e6middot ~INJcS bull D
icule c1C4 t SIIiel-C4CCgt t lOUblccSc6 t $11 cJC~ ~D douJie d eJ) [llIelC6d) doublel cSe6 t lll~I cJcS tlgt cuiei c I Jel 1 (llatI c llc1J middott] ~doubeI clOcl1 SIM3c I O~ ce-c1J bull middot 1riCmiddotcl UIo3] c10HeIc10 ell 1 SiIIIIccIOl)t)
([e~uMel 3e15 11111laquo cllcl Jtj [dOUlielC10cll at Ullic(cl Od 1()1 I~rdCu liecl0ell 1 ~UiC e 14c15 )at] doUOie( e11cl4 a [sniie(c 10el1)I) I 1((douOle(dJelS)t [SIitI c14eU)at I [ciouble( cI214 i-ti [SIrile(cl0c12~I) l ([double( cl Oc 11 )- I ~ [Iu lei c14c15)t1[double c 13c15 tl [1111 Ie( cIlc LJ )t])1 i([doublc(clclamp) Ii [SueI c14cl5)t] [cioublelc 13c15 bull tl (srile(cllclJ)tJ)1 1([doule(c2c4let (SUIIIe(cJcJ)t J [cioubltlclcJ)t1(siAic1cl1 DI I(dOIJolelc5c6l-( (uqltl cJcJ )1 J ~ cioIJble(c1cJ t (sil1ic c1eZ)IDI l(idoule(clcJ)t [sqle(~bullc6)-tJ [ci0l1ble(clc4)-I [SiDltlclcl)-t])) ([dOUlle(c5c6)-I (siJJIe(cbullc6)-tj [4011)Ie(clc4)lj (sqtecICt Dl (doubleC1cJ )t lSllIrIe(c4c6) I I [40IJole( e5c6 )ti [siDitlcJcJ)tDI C[doulielc2c4 )t rslne1e(c4c6 )-tJ ldouolelcJc6) Ii [SiDIelcJcJ) tD
((dou)ie(c1cJJ-t (Slneelt=cI4)ati (4ol1ble(cJc6jtJ SllIlle( cJc nt i) I lt40ullfcampcI0)-tJ unilcu9cll)tl (ciouble(c7c9) rsillltc~1 -~LI 1((doubiecl1ciZ)al [sil1elc9c1 Ht] (doublelc719)Ii [siJiie c7 c~ )al)) 1([ciouoie(c7 19)1 (uAIecl0cl)al] [ciOl1ole(cIcl O)-Ij [S~ile(c7 cS)-ti)1 ((dou lIe(cl ~e11)-t I[SilleI c I 0c12 )1 I doubiel dc O)-t (s1l~lle( c ~cS I j) I 1([Ooule(c 19)-1 (SItitlc 10ell)t] [double cl1c121) ~Slncltlc9 11 t) I ([doule(clcl0)eJ sII1CIelclOcl1)t) [doubltlcllc12)-t ~sieilelc9cl1 laID ([doue(c7 19)al 1[S111lie( c 12cl5 )t] [dol10lei ell cl1 j snllel c9 I1-tDI i([dol1ole(e10cl2Jlj [sineielCUcOt] [cioublelc17cI8JI [slllClcc14cl )t) douoie(el6c1t [SUlC c14c6 )atJ [cioublelc1Jc4Iet] [SllCle(cUclJ)ID) ([cioubie(c19 C21)a l (siniCcllcZO)t [4oubltlcl ~ c18 _t] [sl1lCIe(d cI9)tDI (([douic(cOcl)t ~sil1le( ellc10)t] cioublelc17ell Jtj(sU~Cle(cl bullbull(19)tDI CdouIe(c1 ~ clS)at (siAiel c21 cZZ)li [double(cl 9 11 it~ [SIlCle(cl cI9)t)1 ([doule(clOc2)t lsinelcllc1)tJ ciol1blele19c11 lati [slnlle(c1 c19)t) (idoule(d ~ c15 )t [ulre c11c2Z)tJ [ciolJblel c20etJ [SiAIIe(cUclO)ti) ICdou)le(c19 c21 Jet [lirIe( c21cZZ)t J [double( clOCl)1 j [slnCle(cI5clO)ti)l i ([dol1ble(c2$cr- ~t [SUlie( cl4cl6)atJ [ciol1ble( c2J c4IetJ [SIAl Ie( c2J2$) t)) ([doulelc26clS )tj (sililiel c4c26)-tJ [ciOl1ble( clJc4)tl [sltlCle(clJcl5)t)j i(doule(c2Jcl4)t Iilele7 c2I)I ~doublelcl5ci)tl ~slnCle(clJcl$)ti) ) ([dol1le( c6cl )at [SlnlelC~ cS)t] [cioubie(cSc7 at) ~Slnlle(c2JcWtj)1 ([~ou~le(c2Je24 )t] [Ulie( e7 cI)at] [ciOUJCc26cSI iS1ni1e(c24C6)t)) (((cicule(cl5cl)tj [siqeI c cZl)tJ [~ou~itltc6cZS ti Stlilc(clolcl6et)1 ((d0I101e(c2C4Ja t [siJJle(cJcJ)-tJ [4ouble(clcJ)t SUlCc1(2)tDI ((douolc(c5c6 Jet [Sqle(cJcJ)et) ~ciouble(dcJ )ti [SiJlICelctDl i([cioul c1cJJt j LsiDIIe(c4c6)tj [double(clc)-t I siIC c1ltID ([dOI1i)Ic(cJ c6)et [Slqle( c4(6)et j (double( cc4)t l UliCc1c2~~DI 1((4cule(clcJ)ti [sulic(c4c6)-t1 (ciolampblc5c6)ti [siqituJcJiatDI Cfdou)lc( cl(4)-t (sLnllc(e4c6)et1(double(cJc6)tj [silllC cJcJ)DI l((double( c1cJ Jt [nt-lie( c6clO)tj [dolampbllaquod c6l-ll (siQCie(cJd )t) I
([dol1ble(cSclO)tj ~SlIlilll9d 1)et) [double(c7 19)t]lSilCielc7 cl bulltl (~[ciouOIe(cl1cllit sUe( c9 cll)t] (dollble(c7 19)-t SUle(cc )tDl ((doue(c7c9 let [sqe(c10cll)et1 [dolampble c1cO)-t) (sine Ie( c7 cS )at j)I Ifciou 1e(I1c1) ti [SiqitltclOcll)et] [doublCC bullbullc1 O)tj (SUlie( c7 cS )tDI 1[double(c1ctl-tj [Siqle(c10cll)at] [double(c11cllj-tj [Slllltl19 cl1 )) J([dol1b1cltcclO)tJ SUlllec10clljt] [doubleclld1)tl [SUlIe(c9cll )-t)1 ~[double(c7 c9)-t [siqltcUc11) tj (ciouble(CllcU t] [Sllllllct cll ~D ([dougtle(c14cl6)tj [siqle(c15d IJ ~dobict c13el5 1 Sitlile(c1Jc14l t) t~[dcuole(c1~ (15)t [sietlcl5cl )et] [ciouble( e13cl$ -( slqle(clJc1t) 1([dou)ie(clJd5)ti [SleCelC 16c15atJ (~cublt1c14c16 let [Sltllle(dJc14)t) ([dou)le(cl ~ cS)t1(siJCielc16cU)tllciouble( c14c16l-tl [Sltllle(clJcl )at) l l(dC1Jlle(cUc 1S)t [sieIcc16cU)-t) rcoubitW c18 it [Sl1lIle(cS cl -()I idoublC dcI6)- lSUIelc16cl I)tj ~doublet c1 ~cU )t (snlle(c15d () ([d01JIe(c13c1$ t] [sintI c1 S)t i [coubielc11elS t lllnclc(d5cl -lltgt douoieicl7 c9)t [1IfI~25cZ-tl ~eoubi c4c25 bullt stlllec10c2( ICd~ulie( cJJcJ~ et sirir eo3 lcJJ 11 coubltl cJOcJ 1 at[Stli 1e( c2lcJOt) bull [u)lt cJ9 c41t 5Crc3- 9 )t douOlelcJ6cJ 7t] Sltllif ccJo)~1 ~do~ c26c2S)( slIlaquo ~ c~middotmiddot -Iuoitl C4c~ sr~ie(c2~c6 ~c~ole(7 ~l9 Jet 1 e-~ jC - OJolec4ct SAi~e(le2~ J~
101
middotdo)eltI-clS ~SilecI5C-lt cc otcLJelmiddotmiddot 1ieclJC ~)Ie 13 1$] m1i1rclO 15a~ cOJbtc1416)a SIi eI c13 a co~~eltd 718 at Sli1eltcI6clS a ec J~c1416)a usel ell l~ a dJuelt cUcl$~ Sli1e- cI618 t [coublt cl- 15 at 1iM I - a ~JIif cI416al Milt clo 5a~ ldo 1 oitd ~ eli at gtieI cls a
oemiddotclJcU middotslile- c i3 COfcl- ) 1lieltC$ CJ )MOZ s 11elt c21cJ o coJbie c19 clJ [siielt 19 0
CdOMSC4 i Sl1i1M ~J)tl [do biCl c19c nt [Slq C c190 o 1) I ( dn beIC1 9 cnmiddot l SIIilCl clc4)t] [do~bltUOcl )tl [Slielt c 19 0 ~ ([doublelt clc4) Ii Slnllelt cllcZ4 )_tj [doubl~cOcl)t) rsincelt c 19 e20) j)lcdoubleltc19 c21)t i [snllelc2lcZ4)-t] [doublelt c3c4) t j [urllekZl c2l I)) i Cdoubieltc20c2Z11 scie( c2lc4Jtj [doJ ble c3c24)1 I [SlIliie( cnCl 1 1) lt~oubif c19 e21) 11 (Slllllelt c24c9) Ii [dOlO oiecZl24)t] ~Slrlie(elcJ 1])1
Ed ~lCe
109