A representation of complex events and processes for the acquisition of knowledge from texts

15
ELSEVIER Knowledge-Based Systems 10 (1998) 237- 251 Knotvledge-Based 5VSTEMS A representation of complex events and processes for the acquisition of knowledge from texts Fernando Gomez* Department of Computer Science, UniversiO" of Central Florida. Orlando, FL 32816, USA Received 23 April 1997; accepted 22 September 1997 Abstract Temporal event hierarchies are proposed as a technique to represent knowledge in texts describing processes. The representation is that which a program, without prior knowledge about these processes, will be able to acquire by reading a description of a process. The representation is intended to be used by programs that acquire knowledge from texts. This representation is viewed as a bridge between the English description of a biological system and deeper representations of systems such as those proposed by qualitative physics and model-based reasoning. The representation is based on temporal relationships that are used to index and organize knowledge about biological systems (circulatory, respiratory, urinary, etc.) in a hierarchy of events. Several reasoning methods that the proposed representation permits are explained. The event centered representation proposed here is also related to object centered representations. Finally, algorithms for the automated construction of the event hierarchies are explained. © 1998 Elsevier Science B.V. Keywords: Knowledge representation; Event-centered representation; Knowledge acquisition from texts I. Introduction A process is a complex temporal connection of events. A system is a complex temporal connection of events and also of interconnected physical objects. The functioning of biological systems such as the urinary, circulatory and respiratory systems and the operations of mechanical systems such as a heat pump and an internal combustion engine are prime examples of processes, but also the con- certina movement of snakes or the eruption of a volcano are examples of processes. The representation of these processes is studied from the point of view of acquiring knowledge from texts. What kind of representation does a program, or a human for that matter, need to have of the circulatory system or of a heat pump, say, to be able to answer questions about their functioning? This research is concerned not only with questions requiring a simple retrieval of facts, but also with questions that require problem solving. However, the discussion does not center on expert models; that is to say, the model that an expert would have of a process after many years of training. Rather, it focuses on the model or representation that an educated layman would need to con- struct from reading a text (say, an encyclopedia article * Fax: 00 1 407 823 2764; e-mail: [email protected] 0950-7051/98/$15.00 © 1998 Elsevier Science B.V. All rights reserved PII S0950-705 1(97)00047-6 describing one of these processes) in order to be able to answer questions about the process. This research addresses the issue of common sense reasoning and problem solving as they manifest themselves in the comprehension and acquisition of knowledge from natural language. Recently, it has been an intensive research activity in information extraction, in which the goal is to read an unrestricted real-world text, identify some relevant frag- ments and extract from them pieces of information by filling the slots of a pre-established template [I-3]. Because these systems filter irrelevant information, their output can free human experts of having to scan masses of texts searching for some relevant information. The acquisition of knowledge from texts for expert systems has been approached by semiautomated systems [4-7] and by fully automated systems. In semiautomated ones, the user is responsible for helping the system to deal with some aspects of the process of acquisition. In Ref. [8], the authors describe a fully automated system called SNOWY, which is able to build classification hierarchies from natural language without using elicitation techniques [9]. The system was applied to the construction of a con- sultation system [10] for the selection of a programming language. (See Ref. [11] for a discussion of knowledge- acquisition tools illustrating the complete development of the acquisition process.)

Transcript of A representation of complex events and processes for the acquisition of knowledge from texts

Page 1: A representation of complex events and processes for the acquisition of knowledge from texts

E L S E V I E R Knowledge-Based Systems 10 (1998) 237- 251

Knotvledge-Based 5VSTEMS

A representation of complex events and processes for the acquisition of knowledge from texts

Fernando Gomez*

Department of Computer Science, UniversiO" of Central Florida. Orlando, FL 32816, USA

Received 23 April 1997; accepted 22 September 1997

Abstract

Temporal event hierarchies are proposed as a technique to represent knowledge in texts describing processes. The representation is that which a program, without prior knowledge about these processes, will be able to acquire by reading a description of a process. The representation is intended to be used by programs that acquire knowledge from texts. This representation is viewed as a bridge between the English description of a biological system and deeper representations of systems such as those proposed by qualitative physics and model-based reasoning. The representation is based on temporal relationships that are used to index and organize knowledge about biological systems (circulatory, respiratory, urinary, etc.) in a hierarchy of events. Several reasoning methods that the proposed representation permits are explained. The event centered representation proposed here is also related to object centered representations. Finally, algorithms for the automated construction of the event hierarchies are explained. © 1998 Elsevier Science B.V.

Keywords: Knowledge representation; Event-centered representation; Knowledge acquisition from texts

I. Introduct ion

A process is a complex temporal connection of events. A system is a complex temporal connection of events and also of interconnected physical objects. The functioning of biological systems such as the urinary, circulatory and respiratory systems and the operations of mechanical systems such as a heat pump and an internal combustion engine are prime examples of processes, but also the con- certina movement of snakes or the eruption of a volcano are examples of processes.

The representation of these processes is studied from the point of view of acquiring knowledge from texts. What kind of representation does a program, or a human for that matter, need to have of the circulatory system or of a heat pump, say, to be able to answer questions about their functioning? This research is concerned not only with questions requiring a simple retrieval of facts, but also with questions that require problem solving. However, the discussion does not center on expert models; that is to say, the model that an expert would have of a process after many years of training. Rather, it focuses on the model or representation that an educated layman would need to con- struct from reading a text (say, an encyclopedia article

* Fax: 00 1 407 823 2764; e-mail: [email protected]

0950-7051/98/$15.00 © 1998 Elsevier Science B.V. All rights reserved PII S0950-705 1(97)00047-6

describing one of these processes) in order to be able to answer questions about the process. This research addresses the issue of common sense reasoning and problem solving as they manifest themselves in the comprehension and acquisition of knowledge from natural language.

Recently, it has been an intensive research activity in information extraction, in which the goal is to read an unrestricted real-world text, identify some relevant frag- ments and extract from them pieces of information by filling the slots of a pre-established template [ I -3 ] . Because these systems filter irrelevant information, their output can free human experts of having to scan masses of texts searching for some relevant information.

The acquisition of knowledge from texts for expert systems has been approached by semiautomated systems [4-7] and by fully automated systems. In semiautomated ones, the user is responsible for helping the system to deal with some aspects of the process of acquisition. In Ref. [8], the authors describe a fully automated system called SNOWY, which is able to build classification hierarchies from natural language without using elicitation techniques [9]. The system was applied to the construction of a con- sultation system [10] for the selection of a programming language. (See Ref. [11] for a discussion of knowledge- acquisition tools illustrating the complete development of the acquisition process.)

Page 2: A representation of complex events and processes for the acquisition of knowledge from texts

238 F. GomeJKnowledge-Based Systems 10 (1998) 237-251

In Ref. [12], a system is described that deals with the acquisition of knowledge from physics texts, a work that is akin in purpose to the one reported here. In Ref. [13], SNOWY is extended and applied to the acquisition of knowledge from encyclopedic texts. The goal is the bottom-up construction of the final knowledge representa- tion structures from the logical form of the sentences, with- out intervening scriptal or template-based knowledge about the topic. Hence, our system does not start with a frame containing the main slots to be filled for a topic, say 'volcanos,' as in recent MUC and Tipster projects [1,2,14], but rather it builds everything relevant to volcanos from the output of the interpretation phase. A recognizer and integra- tion algorithms are activated to keep memory organized as new concepts and relationships are integrated in memory. The integration algorithm relies in a classifier algorithm to classify new concepts and relations. Thus, every sentence undergoes a parsing phase, a semantic interpretation phase [15], a formation phase, which builds the final knowledge representation structures and an integration phase [16].

In this paper, v~e describe the knowledge representation structures necessary for the program to acquire knowledge about complex processes. We assume that the knowledge in the program is that of a layman who is reading about the circulatory, respiratory systems, etc. for the first time. The representation proposed here is viewed as a bridge between the English description of biological systems and the repre- sentation that, say, an expert biologist could have of such systems. The proposed representation is a necessary element in the acquisition of sophisticated models from text books and in explaining these models to beginner students. The long-term goal of our research is to build an integrated program that will acquire a representation of a biological or physical system by reading a text or by interacting with a teacher and will progressively refine this representation until it becomes a sophisticated model of the system [17-19].

Our language [16] for representing the knowledge in expository texts is a classification-based language, which indexes concepts in LTM (long-term memory) using three kinds of relations: necessary, necessary and sufficient, and contingent conditions. The main problem with classification systems when they are used to represent systems is that although the relationship used by these representation systems to index concepts, namely the is-a relationship, is excellent to index concepts denoting objects, say the 'heart,' 'lungs,' etc., this relationship is of little use for indexing knowledge about biological or physical processes. This is due to the fact that knowledge about processes is not organized around physical entities or objects, but around events. However, some type of hierarchical relationship needs to be used to represent the immense amount of knowl- edge about any biological system. Otherwise, retrieval, rea- soning, recognition and integration of new pieces of knowledge would become computationally intractable. In the following pages, a system is represented as a hierarchy

of events linked by the subevent relationship. Besides this relationship, other relationships, of which the while-merge relationship is the most important, connect the events in the hierarchy. Each node in the hierarchy represents an event. The subevent relationship does not replace the is-a relation- ship in the representation of the system. Rather, the final representation combines the two indexing relations: the is-a relationship is used to index the objects or entities men- tioned in the events and the subevent relationship is used to index the actual events.

In order to give the reader a sense of the problems being studied, consider the following four texts describing fish respiration, blood circulation, the urinary system and the intake stroke of an internal combustion engine, respectively. These texts were stripped of sentences that are not relevant to the processes being described and also of redundant sentences.

Fish take water into the mouth. From the mouth water enters the pharynx. From the pharynx, water passes into the gills. Blood flows from the body into the gills. As water and blood pass trough the gills, oxygen in the water passes through the membranes in the gills and combines with blood and carbon dioxide in the blood combines with water in the gills. Then, water is expelled out of the gills and blood flows to the body from the gills.

The heart contains four chambers. Two of the chambers receive incoming blood. The other two chambers pump blood out of the heart. When blood is pumped out of the chambers, valves snap shut with a 'thump-thump', which is known as the heart sound. Veins bring blood to the heart from all parts of the body. This blood first enters the right atrium. Then it passes through a valve into the right ven- tricle. From there, it passes through a second valve to the pulmonary artery leading to the lungs. In the lungs, the blood picks up oxygen. The pulmonary veins bring the blood back from the lungs to the heart, where it enters the left atrium. The blood passes through a third valve to the left ventricle. There, the blood is pumped at high pressure through a fourth valve into the aorta, the main artery of the body.

The artery entering each kidney divides into a network of blood vessels. Each blood vessel ends in a tuft of capillaries called a glomerulus. Each glomerulus is surrounded by the end of a tube, which forms a capsule. When blood passes through the glomerulus, a certain amount of blood, called the filtrate, is filtered through its walls into the capsules. The fltrate passes from the capsule through a series of tubes to the entrance of the ureter. In the tubes, the fluid is changed to become urine, but most of the water, glucose amino acids" and salts are returned to the blood. Waste products from the liver and other organs are eliminated in the urine. The urine f o w s down the ureter into the bladder.

The first stroke of the sequence is called the 'intake stroke.' During this stroke, the piston moves downward and the intake valve opens. This downward movement of

Page 3: A representation of complex events and processes for the acquisition of knowledge from texts

F. Gomez/Knowledge-Based ,~'stems 10 (1998) 237 251

the piston produces a partial vacuum in the cylinder and fuel mixed with air rushes into the cylinder past the intake valve. The fuel comes from the fuel tank. When the piston reaches the bottom, the intake valve closes.

Consider the description of fish respiration and the ques- tion: which places does the water pass through before getting to the gills? This question can hardly be answered unless the events describing fish respiration are grouped and indexed together. Moreover, temporal questions, e.g. what occurs when water passes over the gills? and what-if questions, e.g. what would occur if water does not flow into the gills? could not be answered at all.

Allen's Ibrmal analysis of time [20] provides us with the best tool for defining our concepts. He treats time as time intervals, which are an essential aspect of our notion of subevent rather as points [21 ]. For every temporal relation- ship introduced in this paper, a formal definition will be provided using Allen's concepts. For those readers inter- ested in the tormal definitions, those concepts in Allen's paper necessary for following the formal definitions will be introduced first. (See Refs [20,22] for a detailed descrip- tion of these and other concepts.) However, it is important to emphasize that the concepts and algorithms introduced in this paper are independent of Allen's logic and that the purpose of the formal definitions is to unambiguously define and express these concepts. As the reader will see later, the reasoning performed by the algorithms is based on tree traversal, not on deduction from a set of axioms. Allen starts with a set of primitive relationships that hold between time intervals. Of these, at least the following are needed to understand the definitions given in this paper: STARTS(tl , t2) - time interval tl shares the same begin- ning as t2, but ends before t2 ends; FINISHES(tl , t2) - tl shares the same end as t2, but begins after t2 begins; MEET(tl , t2) - tl is before t2, but tl ends where t2 starts; DURING(t l , t2) - tl is fully contained within t2; BEFORE(tl , t2) - tl is before t2 and they do not overlap; OVERLAP(t l , t2) - tl starts before t2 and they overlap. Following this, Allen proceeds to define the predicate IN, meaning that one interval is wholly contained in another, as follows (if all variables are universally quantified, the uni- versal quantifier is omitted):

IN(tl , t2) ¢:~ DURING(t l , t2) v STARTS(tl , t2)

v FINISHES(tl , t2).

Another of Allen's predicates that is needed for our pur- poses is that of OCCUR. This predicate takes an event, e and a time interval, t and is true if the event happened over the time interval t and there is no subinterval of t, say t prime, over which e is true. This is captured in the axiom:

OCCUR(e, t) A IN(t' , t) ~ ~OCCUR(e, t').

This paper is organized as follows. The first part of the paper (Sections 2 -5 ) introduces the main temporal relationships

23 t)

used in the representation. In the second part of the paper, Sections 5 - 8 explain how this representation is used tot reasoning. Section 9 relates is-a hierarchies to the event- based hierarchies presented in this paper. Section 10 provides algorithms for the construction of the subevent hierarchies and Section 11 gives our conclusions.

2. The subevent relationship

An event may consist of several subevents, For instance, the event underlying the sentence Peter went]?om Orlando to Tampa by train and plane consists of the two subevents Peter went from Orlando to A by train and Peter went from A to Tampa by plane. Here, A refers to an arbitrary location. Subevents are refinements of the supper-event. In general, event e 1 is a subevent of event e2, written Subevent(el, e2), if and only if the occurrence of e 1 is a necessary condition for the occurrence of e2 and el occurs in the time interval of e2. Using Allen's concepts, it can be defined as follows.

Necessary and sufficient conditions for Subevent(el, e2):

3t3t2(OCCUR(el , t l ) A OCCUR(e2, t2)

A NECESSARY(el , e2) A IN(tl , t2).

According to the definition of subevent, the time intervals of the subevents of a given event may or may not overlap. Now, let us consider a property of subevents in which the time intervals do not overlap. Let el, e2 ..... ei be subevents of ek with time intervals tl, t2 ..... ti and tk, respectively. An important property of subevents occurs when t l starts exactly when tk starts, t2 starts exactly when tl ends, ti starts exactly when ti - 1 ends and ti ends exactly when tk ends. If all subevents of an event e have this property, one says that the set of subevents is ordered. Let A be the set of subevents of event e. Using Allen's concepts, an ordered subset of events can be defined as follows.

Necessary and sufficient conditions for ordered (A, e, t):

(1) 3!e l3 t l (member(e l , A) A occur(el, t l) A occur(e, t)

A STARTS(tl , t))A

(2) 3!e23t2(member(e2, A) A occur(e2, t2)

A FINISHES(t2, t))A

V(e3)V(t3)((member(e3, A) ^ --,(equal(e3, e 1 ))

A occur(e3, t3)) ~ 3 !e43t4(member(e4, A)

A occur(e4, t4) A MEET(t4, t3)).

The notion of ordered subevents is not only relevant in the representation of systems, but also in the principled solution

Page 4: A representation of complex events and processes for the acquisition of knowledge from texts

240 F. Gomez/Knowledge-Based Systems 10 (1998) 237-251

el e2

Peter went from Peter went from Leesburgh

Orlando to Leesburgh to Tampa by car

el.1 el.2 Peter went from Peter went from A to

Orlando to A by train Leesburgh by plane

Fig. 1. The representation of Peter went from Orlando to Tampa through Leesburg. He made the trip from Orlando to Leesburg by train and plane. The train took 2 hours'. He went from Leesburg to Tampa by car. The car took 3 hours. The entire trip took 7 hours. How long did the trip from Orlando to Leesburg take ?

of word algebra problems about motion, such as the one depicted in Fig. 1.

The subevent relationship has not been explicitly written in the arcs of the nodes, rather the following convention was adopted throughout the entire paper. All nodes immediately descending from the root node are subevents of the root node. Thus, the notation ei.j is used to indicate that event ei.j is a subevent of ei and ei.j. 1 to indicate that ei.j. 1 is a subevent of ei.j and so on.

The meaning of necessary in all of these definitions is as follows. 'Event el is a necessary condition for event e2' means that e2 does not occur unless el occurs, or ~occur(el) ~ ~occur(e2), or, by contraposition, occur(e2)

occur(el). Trips that occur in stages lend themselves to be represented by using the subevent relation because the trip can be represented as a super-event containing all the stages and then, the super-event can be decom- posed in subevents containing the different substages. Thus, if one considers the representation of the event Peter went f rom Orlando to Tampa through Leesburg

depicted in Fig. 1, event el Peter went f rom Orlando to Leesburg is a necessary condition for the event in the root node to occur. This is so because, although Peter could have gone from Orlando to Tampa through many other routes, he could not have gone from Orlando to Tampa through Leesburg, unless he has gone from Orlando to Leesburg.

3. T h e s t r o n g - d u r i n g a n d s t r o n g - p r e c e d e n c e

r e l a t i o n s h i p s

If an event, say el, occurs during the time interval of another event, say e2 and the occurrence of e2 is a necessary condition for el to occur, then one says el STRONG-DURING e2. In Allen's terms, the following can be written.

Necessary and sufficient conditions for STRONG- DURING(el , e2):

3 t 3 t 2 ( O C C U R ( e l , t l ) A OCCUR(e2, t2)

A DURING(t l , t2) A NECESSARY(e2, el)).

This relationship may be easily confused with the subevent relationship. If el is a subevent of e2, el is a necessary condition for e2 to occur. However, if el is STRONG- DURING e2, e2 is a necessary condition for el to occur. For instance, in the sentence: while the blood passes through the lungs, it picks up oxygen, the event in the blood passes

through the lungs is a necessary condition for the occur- rence of the event in the blood picks up oxygen, and the latter is fully contained in the time interval of the former. Note that it is going to be events connected by 'while, ' that do not match into STRONG-DURING, like the following: while Mary ate, Peter read the newspaper and many others. However, in systems described by a f low relationship, 'while ' matches into STRONG-DURING. (See Section l0 for an detailed explanation of this.)

The relationship of PRECEDENCE for time intervals is defined as:

PRECEDENCE(t l , t2) ¢~ BEFORE(tl , t2)

x/MEET(t l , t2) v OVERLAP(t l , t2).

Now, STRONG-PRECEDENCE for events is defined as follows.

Necessary and sufficient conditions for STRONG- PRECEDENCE(el , e2):

3 t l 3 t2(OCCUR(el , t l ) ^ OCCUR(e2, t2)

A PRECEDENCE(t l , t2) A NECESSARY(el , e2)).

Examples of this relationship may be found in Figs. 3 and 4.

Page 5: A representation of complex events and processes for the acquisition of knowledge from texts

F. Gomez/Knowledge-Based Systems I0 (1998) 237 251

e , e2 e-~ . . . ei

x%

kl g2 "'" kj

Fig. 2. Illustration of the while-merge relationship.

4. T h e w h i l e - m e r g e r e l a t i o n s h i p

An event, say el, is a while-merge event if there are at least two events, say e2 and e3, which occur simultaneously (they occur in the same time interval) and the simultaneous occurrence of these two events is a necessary condition for e l to occur and e I occurs within the time interval of e2 and e3. The word 'merge ' in the 'while-merge' relationship is a convenient way to describe the graphical representation of the while-merge relationship (see the following), but it does not add any meaning to the relationship. The meaning of the

241

relationship is expressed by the following formal definition. The necessary and sufficient conditions for an event to be a while-merge event are defined as follows.

Necessary and sufficient conditions for WHILE- MERGE(el , tl):

3e23e33tl qt2(OCCUR(e2, t2) A OCCUR(e3, t2)

A OCCUR(el , t l) A IN(tl, t2) A NECESSARY(e2, el)

A NECESSARY(e3, e 1)).

In a hierarchy of nodes representing events, the while-merge relationship may be depicted as a relationship connecting at least two or more nodes to at least one or more nodes, as illustrated in Fig. 2.

The relationship of simultaneity is indicated in the figure by writing sim. Let us call the nodes e 1, e2, ei, which merge into the while-merge node, the parent nodes of the while- merge node and let us call the nodes kl, k2 ..... kj the children of the while-merge node. The meaning of the while-merge relationship is that the simultaneous occurrence of the parent nodes of the while-merge node is a necessary condi- tion for the occurrence of the children of the while-merge

Fish Respiration

el water flows from outside to outside through gills

el.i el.2 el.3

®

e2 blood flows from body to body through gills

eL1 ~.2 e2.3 water flows to the gills from outside

/ el.l.1

water flows into mouth from outside

el.l.2 el.l.3 water flows water flows from mouth to frompharynx pharynx to gilts

water flows water flows from the blood flows to the blood passes over the gills gills to outside the gills from the body through gills

% e4 oxygen filters through CO 2 filters through membrane from water to membrane from bk~l to blood water

blood flows to the body from the gills

Fig. 3. Representation of fish respiration.

Page 6: A representation of complex events and processes for the acquisition of knowledge from texts

Blood Circulation

F. Gomez/Knowledge-Based Systems 10 (1998) 237-251

J el

blood flows from body to heart through vein

/ e2.1

blood flows from vein to right atrium

/from e 2 through pul-artery

blood passes through heart

\ e~.2 eL3

blood flows blood flows from r-atr from r-ventrl to r-ventrl to pul-artery through valvei through valve2

blood ~asses through lung

242

+ e4.1

blood picks up oxygen

.ow heart to body

through pul-vein e6

blood ~asses throu h heart

e6.1 e6.2 e6.3 blood flows blood flows blood flows from pulm. from I-air from I-ventrl vein to to l-ventrl to aorta left atrium through valve3 through valve4

Fig. 4. Representation of blood circulation.

node and that the time interval of each of its children nodes is in the time interval of each of its parent nodes. Since the parents of the while-merge node are simultaneous, that is, all of them occur in the same time interval, say tl, then each time interval of kl, k2 ..... kj is in the time interval of tl.

The WHILE-MERGE relationship allows the representa- tion of the events: while blood and water are flowing over the gills, oxygen in the water passes to the blood and carbon dioxide in the blood passes to the water (see Fig. 3). Using the while-merge relationship, one can grasp the fact that for oxygen in the water to pass to the blood is necessary that blood and water pass simultaneously through the gills. The same can be said of the event 'carbon dioxide in the blood passes to the water.'

5. The representa t ion of fish respirat ion , b lood c irculat ion, ur inary sys t em and the intake s t roke o f an internal c o m b u s t i o n eng ine

The representation of fish respiration is depicted in Fig. 3. It is essential to emphasize that the wording of the events in the nodes is greatly simplified and does not correspond to the actual representation of individual events in LTM. Section 9 explains how the individual events are represented in LTM. Note that the relationships are enclosed in ellipses and they are not nodes of the hierarchy. The relation- ships STRONG-PRECEDENCE and SIMULTANEITY are indicated in the figure as sprec and sim, respectively.

The representation consists of two main subevents: el, water flows .from outside to outside through gills, and e2, blood flows from the body to the body through gills. These two subevents consist of the following subevents: e l . l , el.2, el.3, and e2.1, e2.2 and e2.3.

Fig. 4 contains the representation of blood circulation as described in the text. The relationship STRONG-DURING is indicated by S-DUR. Note that the top nodes e l - e 7 are all related by the relationship strong-precedence. Figs. 5 and 6 depict the representation of the urinary system and Fig. 7 the representation of the first stroke of the internal combustion engine.

6. A n s w e r i n g d iagnost ic or what - i f ques t ions

T h e t r a v e r s a l o f t h e h i e r a r c h y o f e v e n t s i s p e r ~ r m e d b y a standard depth-firstsearchalgorithm.

Algorithm for traversing the subevent

hierarchy

Input: Q, a question

Output: E, an event in the hierarchy such

that answers(E, Q)

Open: = Root Node

While Open

begin

Remove the node in front of Open. Call it

E

Page 7: A representation of complex events and processes for the acquisition of knowledge from texts

F. Gomez/Knowledge-Based Systems 10 (1998} 237-251 243

Urinary System

e I e2 e 3 e 4 blood flows blood passes blood flows urine flows to kidneys through kidneys out of kidneys out of kidneys

e2.1 e2.2 e 2.3 e2.4 blood flows blood passes blood flows from blood flows to glomerulus through glomerulus to from capillaries

glomerulus capillaries to veins

e5 blood called filtrate filters through wall into capsule (continued in figure 6)

e2.5 blood flows from veins out of kidney

Fig. 5. Representation of blood flow in the urinary system.

If answers(E, Q) then exit the algorithm

with the parents and children of E

else begin

Append the subevents of E to the front

o f Open

end

end

Exit with failure

Next, we explain under which circumstances an event, say, E answers a question, such as Q. A what-if question has the form 'what if E," where E is an event. Some examples are: what if the pharynx remains closed? and what i f no water reaches the gills?. Two cases are con- sidered in answering this type of question. Case 1: a what- if question can explicit ly negate the occurrence of an event, e.g. what if no water reaches the gills?. Case 2: a what-if question may negate a necessary condition for an event to occur, e.g. what if the pharynx remains closed?.

6.1. Case h what if not E

Let S be a system and consider the question: 'What if not E? ' where E is an event in system S. For example, if S is fish respiration then the question: what if no water reaches the gills from the pharynx? negates event e 1.1.3 in S (see Fig. 3).

In cases like this one, the question is answered on the basis of the following observations.

Let e l and e2 be any events in system S and the question be: 'what if not E. ' If event e l is a subevent of e2 and the E in the question matches el , then e2 cannot take place either because el is a necessary condition for e2 and the question says that e l cannot take place.

If event el STRONG-PRECEDE event e2 in system S and the E in the question matches el , then e2 cannot take place either because e l is necessary for e2 and the question asserts that el cannot occur. The same reasoning applies to every pair of events e l and e2 such that el STRONG- DURING e2 in system S.

If the events e 1, e2 . . . . . ek in system S are connected by the while-merge relationship to event a 1, where a I is a child of the while-merge node and the E in the question matches any one of the e l , e2, ek, then a l cannot take place either because the simultaneous occurrence of each one of el , e2 ..... ek is necessary condition for a l to occur (definition of the while-merge).

Thus, let us consider again the question: what if no water gets to the gills f rom the pharynx?. In this case, the hier- archy representing fish respiration is searched. The event E in the question matches event e l . l . 3 (Fig. 3), so it follows that water does not reach the gills ( e l . l . 3 is a subevent of e l . l ) , water does not flow over the gills (el.1 strong- precede el .2) , oxygen in the water cannot combine with blood and carbon dioxide cannot combine with water

Page 8: A representation of complex events and processes for the acquisition of knowledge from texts

244 F. Gomez/Knowledge-Based Systems 10 (1998) 237-251

filtrate flows through the kidneys (continued from figure 5)

e5.1 e5.2 e5.3 filtrate flows filtrate passes filtrate called urine flows from capsule through tubes from the tubes to the

into tubes

e 6 most glucose in the filtrate filters to capillaries 2.3

e 9 glucose combines with blood in the capillaries

u r e t h r a , out of the kidneys

e,

most water in the filtrate filters to the capillaries 2.3

elO water combines with blood in the capillaries

e8 most amino acids in the filtrate filters to the capillaries ~ 2.3

ell amino acids combine with blood in the capillaries

Fig. 6. Representation of filtrate flow in the urinary system.

(el.2 is necessary for e3 and e4 because of the while-merge relationship), water will not flow out of the gills (el.2 strong-precede el.3), and fish respiration does not take place. However, blood may still flow in and out of the gills (e2.1, e2.2, e2.3).

place. For instance, the definition of flow includes, as a necessary condition for the action flow to take place, that if the source, also called thefrom-loc role, or the destination, also called the to-loc role, are receptacles, then they must be open.

6.2. Case 2: the question negates a necessary condition for an event to occur 7. Questions about properties undergoing change

The question: what would happen if the pharynx remains closed? negates a necessary condition for the events water flows into and out of the pharynx to occur. These questions are answered by traversing the hierarchy, searching for events in which pharynx is an argument. For each such event, it is checked whether the relationship R in the what-if question negates a necessary condition for the event to take place. If that is the case, then the event cannot take place and the criteria of Case 1 are used to find out other events that cannot take place. Whether a rela 9nship is a necessary condition for an event is determin,._t from the definition of the relationship in the event. The definition of relationships such as flow includes, among other things, the necessary conditions for the corresponding action to take

Since the objects in a system may undergo changes at different times during the operation of the system, a ques- tion that asks about a property of an object related to a system cannot be answered by simply accessing the repre- sentation structures of the object, rather must be answered by traversing the event hierarchy representing the system.

Consider, for example, the question: does blood contain carbon dioxide? in the context of fish respiration. Whether blood contains carbon dioxide or not depends on what stage the system is in, that is, it depends on 'which blood' the question refers to. Because during the operation of the system, blood may undergo many changes, some of which may indirectly affect its composition, the question requires that the hierarchy be traversed in order to find out when and

Page 9: A representation of complex events and processes for the acquisition of knowledge from texts

F. Gomez/Knowledge-Based Systems 10 (1998) 237 251 245

Intake Stroke

e I e2 e3 the piston the intake valve the piston moves down opens reaches the

e5 the downward movement of the piston creates a partial vacuum

e4 the intake valve closes

e6 the fuel mixed with air rushes into the cylinder

e6. ! e6.2 the fuel flows from the fuel fuel flows from the intake valve tank into the intake valve into the cylinder

stroke in the internal combustion engine. Fig. 7. Representation of the firs~

where blood contains carbon dioxide. The algorithm should be able to answer the question as follows: yes, blood that flows through the gills contains carbon dioxide because blood releases carbon dioxide in the gills. As the hierarchy is traversed, the algorithm searches for a event implying blood contains carbon dioxide. The traversal takes the algorithm to event e4, carbon dioxide filters through the membrane in the gills f rom the blood to the water. Since the primitive of this relationship isfilter-through, the analy- tical rules stored under it are fired (see Ref. [23] for a detailed discussion of analytical rules.) One of these analy- tical rules is ' if X filters through Y from Z then Z contains X', which allows the algorithm to conclude that blood contains carbon dioxide when event e4 occurs. However, e4 occurs as event e2.2 (blood passes through the gills) occurs, which yields the desired answer. Because there may be other relationships implying blood contains carbon dioxide, the algorithm continues traversing the hierarchy until all its nodes are searched. In our implementation, a feature a la Prolog asks the user if he/she wants to find out more about blood and carbon dioxide in the context of fish respiration.

8. Temporal questions

Some examples of temporal questions are: what occurs when water passes over the gills?, what occurs while water

passes over the gills?, what occurs be~bre water enters the gills ? and what occurs after blood releases carbon dioxide ? These types of questions are handled naturally by our method because the hierarchy is indexed using temporal relations. These questions are answered by traversing the hierarchy searching for the event underlying the question. They are formed as follows: (what-occurs x E), where x denotes one of 'while, ' 'before' or 'after,' and E denotes an event. For example, the question: what occurs while water passes over the gills? is formed as: (what-occurs while water f low (medium (gill))). The algorithm for (what-occurs while E) retrieves all those events which are linked by an IN or DURING relationship to E and also retrieves all subevents of E.

9. Relating the event-based representation to the object- based representation

See Ref. [16] for a detailed discussion of some aspects very briefly explained in this section. In KL-SNOWY, our knowledge representation language, every argument in an event is indexed as an object, which is then linked through is-a relationships to the other objects in the hierarchy of concepts. Links are also created connecting the objects to the relationship itself, which is represented in a separate structure. Assume that the concepts 'whale, ' and 'Antarctic'

Page 10: A representation of complex events and processes for the acquisition of knowledge from texts

246

are already in LTM. Thus, given the sentence whales live in the Antarctic, the following representation is generated for the objects or arguments in the relation:

wha i e (habitat (antarctic aOl)) antartic

(habitat-of (whale aOl) )

The relationship itself, a01, is represented separately as will be indicated in the following text. Note that the inverse relationship of habitat is stored under Antarctic. In general, all the n entities of a nary relationship will be represented as objects and links from these objects will be created pointing to the representation of the relationship itself. Concepts denoted by complex noun groups, e.g. 'sea mammals, ' and restrictive modifiers, e.g. 'animals that live in the sea', 'the gills of fish', etc. are collapsed into a single con- cept by the following method. The concept 'the gills of fish,' is represented by creating the concept, say, yl , an arbitrary name, which is defined by saying that it is a class of gills which are part of fish. Formally:

y l (cf(is-a(gill) part-of (fish)))

The slot of contains the necessary and sufficient condi- tions that define the concept y 1. Thus, 'the membranes in the gills of fish' is represented as:

y2 (cf(is-a(membrane) part-of(yl)))

The recognizer and classifier algorithms, which search for incoming concepts in LTM, check the content of these cf slots in order to find out if a given concept subsumes another. The names of the concepts are, of course, arbitrary and not used at all. Let us consider event e3 in Fig. 3. The exact wording of the actual representation of event e3 is: oxygen in the water in the gills offish filters from the water in the gills offish through the membranes in the gills offish to the blood in the gills offish.

F. Gomez/Knowledge-Based Systems 10 (1998) 237-251

x4, respectively. These names are, of course, arbitrary. All of these concepts will be represented as separate objects and pointers will be created back to the 5ary relationship filter, which is represented as:

( a O 1

relation (filter) instance-of (event) event-of (fish-respiration) arguments (unknown, xl x2, x3, x4 ) (actor(unknown(q(?)) theme(xl(q(some))) from-loc (x2 (q(some))

medium(x3(q(?))) to-loc(x4(q(some) ))

)

Note that the grammatical subject 'oxygen in the water in the gills of fish' of event e3 has become the theme in the logical form because in our ontology the actor of the relationship filter must be an animate being. The actor of the relationship is unknown. The slot q stands for quantifier and indicates the quantification of the concept. In our exam- ple, all mass nouns were existentially quantified, while countable nouns were quantified with a question mark indi- cating that the value of the quantifier is unknown. The scope of these quantifiers is from left to right (see [ 16] for a com- plete discussion of this.) The concept fish-respiration is represented as:

fish-respiration (cf (is-a (respiration)) (pertaining- to(fish) ))

respiration (is-a(process))

N o w , the connection between the event-based and the object-based representations can be established. The con- cept xl, 'oxygen in the water in the gills offish,' and the concept x2, 'water in the gills offish ', are represented in the following hierarchies.

gas

I i S - a

I oxygen

I is-a I

x l - - * *

The following four complex concepts will be created according to the ideas briefly explained in this section: 'oxygen in the water in the gills offish ", 'water in the gills offish ', 'membranes in the gills offish', and 'blood in the gills offish'. Let us call these concepts xl , x2, x3 and

water

I is-a

I x2 - - * (fil%er (aOl))

(processes (fish-respiration))

(filter (a01)) (processes (fish-respiration blood-circulation))

Similar representations are created for the concepts x3 and x4. The cf slots of the concepts xl and x2 are not shown. The reader only needs to know that they are gener- ated in a similar manner to the concept 'the membranes in the gills of fish.' The representation of a question is identical

Page 11: A representation of complex events and processes for the acquisition of knowledge from texts

F. Gomez/Knowledge-Based Systems 10 (1998) 237-251 247

to the representation produced for a declarative sentence, except for a question slot in the former. Now, consider the question: which gas filters from water to the blood offish ? Note that although the question does not exactly match the relationship aO1, that relationship is found by inheritance applied to all the arguments of the relationship underlying the question. Unknown arguments in the question match known arguments of the relationship in LTM [161. However, the question: which places does oxygen pass through before getting to the gills of fish? cannot be answered unless the processes in which a concept intervenes are indexed under that concept. Consequently, concepts x l, x2, x3 and x4 have pointers to the processes fish-respiration and blood-circulation, among others. This is indicated in the representation by writing the slot processes. If the answer to a question cannot be obtained by matching the relationship in the question, then the processes are traversed searching lor the answer as indicated in previous sections. Prior to giving the criteria that make a process a candidate for traversal, the following definitions are needed. Let A and B be any concepts. The set defining-concepts of A is defined a follows:

1. A is in the set defining-concepts of A. 2. If B is in the set defining-concepts of A, so are all con-

cepts in any of the relationships in the cf slot of B. 3. The only concepts in the set defining-concepts of A are

those obtained by (1) and (2).

Thus, the complex concept 'the membranes in the gills of fish' defined previously as 3'2, has the following defining- concepts set: {y2, membrane, yl , gills, fish}.

Let A and B be concepts in LTM. A is a subconcept of B if there is a sequence of concepts a l , a2, a3 .... an in LTM such that al = A and an = B and the relationship ai is-a(i + 1) exists in LTM for all i = 1 ..... n - 1. I fA is a subconcept of B, then B is a superconcept of A.

Finally, the criteria that trigger the traversal of a process can be given. Let Q be any question. Let B be a concept that is an argument in any of the relationships in the representa- tion of Q. Let A be the set defining-concepts of the concept B. A process will be traversed in order to answer question Q if there is a B in Q and an A such that at least one of the following cases hold:

1. There is a concept in A that is also an argument in any of the relationships in the cf slot defining the process.

2. There is a concept in A that is also a subconcept of an argument in any of the relationships in the cfslot defining the process.

3. There is a concept in A that is also a superconcept of an argument in any of the relationships in the cfslot defining the process.

Case (1) is the simplest case because it does not require a tree traversal through the is-a links. Given the question: which places does oxygen pass through before getting to the gills offish ?, one can see that the concept fish appears

in the c~fslot offish-respiration as given previously and also in the set defining-concepts of the concept 'the gills of fish', which is an argument of the relationship, getting, in the representation of the question. Let us consider case (2). Suppose that the relationship pass through beJbre getting to the gills of sharks? will make also fish-respiration a can- didate for traversal because shark is a subconcept offish. Case (3) is used in questions like: which places does oxygen pass through before getting into the blood of animals. Let us assume that the concept oxygen has pointers to the processes fish-respiration, and mammal-respiration and that fish and mammal are subconcepts of the concept animal in LTM. Because animal is a superconcept of the concepts fish and mammal, the algorithm asks the user to choose one of the concepts. If the user does not select any concept, the algorithm will try to find an answer to the question by traversing the two processes.

A frequent misconception of the cases enumerated in the preceding text is caused by questions like: which places does o,~vgen pass through before getting to the gills? or which places does oxygen pass through before getting to the blood?. For these questions, the algorithm will not know which process to traverse because there is no mention of fish or any other animal in the questions. However, the questions have two definite references, ( 'the gills,' and "the blood') that need to be solved by the natural language understanding system prior to producing the final represen- tation of the question. In other words, the natural language understanding system will need to find out whose gills the question refers to. Note that 'the blood' or 'the gills' in the question may refer to any animal and the actual reference can only be decided by examining the context of the dia- logue between the user and the system. They could have been talking about frogs, grasshoppers, etc.

10. Algorithms for constructing the subevent hierarchy

In Sections 6 -8 , it was explained how the representation formalisms can be used to draw inferences and answer ques- tions. The question or query can be a formal one like those formulated by an expert system needing some piece of knowledge, or the query can be expressed in natural language. As indicated in Section 1, the main intent in designing these representation formalisms is automating the acquisition of knowledge from texts. In this section, we gather some evidence to justify this claim by showing how the temporal event hierarchies can be constructed from natural language. The goal is to read, say, a description about fish respiration and build automatically a representa- tion very close to the one depicted in Fig. 3. The input to the algorithms described in this section is the logical form of the sentence, which is a representation of context-independent meaning. For the purposes of this paper, the reader needs only to know that the logical form consists of a predicate, or verbal concept and some abstract semantic relationships,

Page 12: A representation of complex events and processes for the acquisition of knowledge from texts

248

called thematic roles. The roles referred to in this paper are actor, the animate agent that performs the action; the theme, the thing that suffers the action of the event; the to-loc the final location in a event expressing a change of location; and the from-loc, the original location. Thus, in the sentence: 'the car broke', the grammatical subject 'the car' is the theme of the event; in the sentence 'Peter flew from Tampa to Orlando', 'Peter' is the actor and ' from Tampa' is the from-loc and 'to Orlando' is the to-loc.

The logical form is constructed by the semantic interpre- tation algorithm described in Ref. [15]. The input to the semantic interpreter is a partial parse, one in which structural ambiguity caused by modifier attachment is left unresolved. Semantic interpretation is centered around the representation of verbal concepts, or predicates, which are organized into a classification hierarchy. The representation of the predicates contains information about their thematic roles, the syntactic relationships that realize them and the prepositions claimed, or licensed, by the predicates and whether they license them strongly or weakly. For instance, the predicate cause-change-of-location licenses the preposi- tion ' into' strongly, meaning that a PP (prepositional phrase) headed by the preposition 'into' will be attached to the verb and will become the to-loc thematic role. Pre- positions that are strongly licensed, or claimed by a predi- cate, are solely attached to the verb without any other considerations, while prepositions weakly claimed by a pre- dicate are attached' to the verb, but they also may be attached to other constituents. If that is the case, heuristic rules decide among competing constituents.

Besides the representation of predicates, the semantic interpreter uses rules for determining the meaning of the verb, called VM rules. These rules use the syntactic realiza- tions of words and their semantic category to determine the meaning of the verb, or predicate. These rules are activated every time the parser parses a constituent. The algorithm delays all semantically important decisions (e.g. attaching PPs, identifying thematic roles, etc.) until the meaning of the verb is determined. Once the meaning of the verb is determined, procrastination ends and the interpreter attaches PPs and determines thematic roles by matching the consti- tuents already parsed and those yet to be parsed against the entries in the representation of the predicate. See Ref. [15] for a detailed presentation of the interpretation algorithm, evaluation and a discussion of related semantic interpreta- tion algorithms.

There is a close connection between the construction of the subevent hierarchies and the role that classifiers play in terminological languages [24]. The problem to be solved by classifiers is where to place a concept in a hierarchy of concepts. The problem for the constructor of the subevent hierarchies is where to place an event in a hierarchy of events. Classifier algorithms are based on the meaning of the is-a relationship. In the case of the event hierarchies discussed in this paper, the construction of the event hier- archies depends on the meaning of the primitive relationship

F. GomeffKnowledge-Based Systems 10 (1998) 237-251

in the events describing the system. The algorithms given later are constructed for systems described by the relation- ship flow.

Prior to presenting the algorithms, let us consider the following definitions. In all these definitions, el, e2 ..... ei refer to events. We use the notation ei((thematic-role)), where (thematic-role) stands for any thematic role, to mean the thematic role of event ei. The representation of flow is similar to the representation~of filter in the previous section. That is, the grammatical subject of flow becomes its theme in the logical form. Thus, for the event el, water flows from the pharynx to the gills, we have el(theme) --- water, el(from-loc) = pharynx and el(to-loc) = gills.

CONNECTED(el , e2) ¢~ el(theme) = e2(theme) A to-loc(el) = from-loc(e2) IMMEDIATELY-PRECEDE(el , e2) ¢:~ precedence (el, e2) A ~3x(event(x) A -,equal(x, el) A precedence(x, e2) A precedence(el, x)) IMMEDIATELY-CONNECTED(el , e2) ~ immedi- ately-precede(el, e2) A connected(el, e2) Let A = {el, e2 ..... ei} be a set of events. A is an ISLAND ¢~, connected(el, ei) A V(x)(member (x, A) A -,equal(x, el) ~ 3(y)(member (y, A) A (strong-precedence(y, x) or precedence(v, x))

Note that for a set of events to form an island, all events in in the set except the first one must be connected by a strong- precedence or a precedence relationship and the first event in the set must be connected to the last event in the set. Some examples given in the following will illustrate this definition.

10.1. Algorithm A: for recognizing strong-precedence or precedence relationships for flow

Input: E, an event described by a flow primitive to be integrated in the representation.

Output: L, a list of the form ((el, e2 ..... ei)) (evl, ev2 ..... evi)...(evel, eve2 ..... evei)) indicating the position of an event in the representation, where ei, evi and evei are events and each event in each sublist is connected to the next event in that sublist by a strong-precedence or a precedence relationship.

1. If L is empty, open a sublist in L and insert E into that sublist.

2. If L is not empty: 2.1. Find an event in L that is immediately connected to

E. If such event exists, integrate E after that event. 2.2. Else find an event in L that is connected to E. If

such event exists, integrate E after that event. 2.3. Else find an event in L whose theme is the same

as the theme of E. If such event exists, say ek, inte- grate E at the end of the sublist of which ek is a member.

Page 13: A representation of complex events and processes for the acquisition of knowledge from texts

F. GomeJKnowledge-Based Systems I O (1998) 237-251

2.4. Else open a new sublist in L and insert E in that sublist.

Let us illustrate the algorithm with the following examples. Suppose that L is empty and the event (el), water flows from outside into the mouth offish, is to be integrated in it. Because L is empty, the integration of el gives L = ((el)). If the next event to be integrated is (e2t from the mouth water flows into the pha~nx, L becomes equal to ((el, e2)) because el and e2 are immediately connected. If the next event read is (e3) blood flows to the gills from the body of the fish, L becomes ((el, e2) (e3)) by rule (2.4). If the next event read is (e4) waterflows over the gills, L becomes ((el, e2, e4) (e3l), by rule (2.3). Finally, if the next event is (e5) from the pharynx water flows into the gills, L becomes ((el, e2, e5, e4) (e3)), by rule (2.2). Note that the algorithm can integrate the events in the correct position even if the narrator does not introduce them in the order in which the events occur. Of course, the discourse relationships are going to hurt if the order in which the events are presented by the narrator differs considerably from the order in which the events occur. Next, we give the two algorithms to form subevent hierarchies.

10.2. Algorithm B." form hierarchies of those events immediately connected

Input: L, a list of sublists containing events ordered by a strong-precedence or a precedence relationship. L is the output of algorithm A.

Output: a subevent hierarchy.

1. Collect all events in L that are immediately connected into the set A. Let A = {el, e2 ..... ei}. Create a new event, say Ek (in the actual program a new gensym), such that Ek(theme) = el(theme), Ek(from-loc) = el(from-loc) and Ek(to-loc) = ei(to-loc). Increment k.

2. Links all events in A as subevents of event Ek. 3. Remove all events in the set A from L and add Ek to L. 4. Repeat until there are no events in L that are immediately

connected.

10.3. Algorithm C." Form hierarchies of those events that form an island

Input: L, a list of sublists containing events ordered by a strong-precedence or a precedence relationship. L is the output of algorithms A and B.

Output: a subevent hierarchy.

1. Collect all events in L that form an island into the set A. Let A = { e 1, e2 ..... ei }. Create a new event, say Ek (in the actual program, a new gensym), such that Ek(theme) = el (theme), Ek(ffom-loc) = e 1 (ffom-loc) and Ek(to-loc)

ei(to-loc).

249

2. Link all events in A as subevents of event Ek. 3. Remove all events in the set A from L and add Ek to L. 4, Repeat until there are no events in L that torm an island.

If algorithm A is given as input, the events describing a flow relationship in the text about fish respiration, it will output the following: L ---- ((el.l.1 e l . l .2 e l . l .3 el .2 el.3) (e2.1 e2.2 e2.3)). The names given for the events correspond to those shown in Fig. 3. We repeat those events here: el. I. 1 water flows into the mouth from outside, el. 1.2 water flows ,from the mouth to pharynx, el. 1.3 water flows J?om pharynx to gills, el.2 water flows over gills, el.3 water flows to tire outside from the gills, e2.1 blood flows to the gills from the body, e2.2 blood passes through the gills, e2.3 blood flows to the body from the gills. We will illustrate algorithms B and C by showing how they will build the hierarchy in Fig. 3.

Algorithm B will reduce the first sublist in L to: (E001 el.2 el.3) because e 1.1.1, e 1.1 +2 and e I. 1.3 are immediately connected. These three events become subevents of E001, which reads as water flows to the gills from outside and is event e 1.1 in Fig. 3. As one can see, the to-loc of E001 is the to-loc of el. 1.3 and the from-loc of E001 is the.from-lot of e l . l . I . Algorithm B leaves the second sublist in L untouched because the events in it are not immediately connected.

Algorithm C reduces, the sublist (E001 el .2 el .3) in L to (E002) because the to-loc of E001, ,gills, is the same as the from-loc of el.3, gills; that is, they form an island. The algorithm makes E001, el.2 and el.3 subevents of E002, which reads as water flows from outside to outside and is event el in Fig. 3. Algorithm C also reduces the sublist (e2.1 e2.2 e2.3)) in L to (E003) because the events in it form an island. Events e2.1, e2.2 and e2.3 become subevents of E003, which reads as blood flows to the body from body through gills and is event e2 in Fig. 3. Consequently, L is reduced to (E002 E003), the two top nodes of the hierarchy in Fig. 3. The hierarchy was built bottom-up.

The integration of the strong-during and while-merge relationships has no influence on the construction of the subevent hierarchy and that is why it is better to introduce them after the algorithms for the hierarchy construction. As the algorithms are implemented currently, the while-merge relationship needs to be explicitly stated by using some type of temporal conjunction, e.g. as water and blood flows through the gills, oxygen in the water passes to the blood and carbon dioxide in the blood passes to the water. The temporal conjunction (as, while, when. etc.) is just a cue to recognize a while-merge relationship. The most important key in recognizing a while-merge relationship is the flow primitive underlying the events connected by the temporal conjunction. If the flow relationship is absent, the present algorithm will fail to recognize a while-merge relationship unless considerable amount of specific knowledge about the situation is incorporated into the algorithm.

The strong-during relationship is altogether a different situation because it can appear in the narration without

Page 14: A representation of complex events and processes for the acquisition of knowledge from texts

250 F. GomeJKnowledge-Based Systems 10 (1998) 237-251

being explicit ly introduced by the narrator. The following algorithm recognizes strong-during relationships in the con- text of a system described by aflow relationship.

10.4. Algorithm D: recognize strong-during relationships for flow

Input: E, an event to be integrated in L, the list containing the representation being built. Output: an event, say ek, in L such that s-dur(ek, El.

1. If the relationship of event E is an action, but is not the flow action and there is an event, say ek, in a sublist of L, say li, whose theme is identical to the theme of E, ek(theme) = E(theme), then:

2. Find the event in li whose to-loc is identical to the at-loc of E. If no event is found, exit with failure. If more than one event exists with such property, select the one at the end of li, the event most recently read for that sublist. Create the relationship s-dur(ek, El.

3. Else exit with failure.

Suppose that L contains the following two events: water flows from the pharynx to the gills and blood flows from the body to the gill, called el and e2, respectively. Algori thm A will produce L = ((el l(e2)) . The next event to be integrated is e3: in the gills, blood picks up oxygen from the water in the gills. The action of this event is pick which is not an instance of aflow action. The theme of the event is 'blood', which is the theme of event e2 in L. The at-loc of e3 is 'gills', which is the to-loc of event e2 in L. This will result in integrating e3 as s-dur(e3, e2). If the next event read is e4: in the gills, water picks carbon dioxide from the blood, e4 will be integrated as s-dur(e2, e4). Note that the algorithm does not have a way to find out that the two s-dur relation- ships combined are really a while-merge. The algorithm could establish that events e4 and e6 are connected to events e3 and e2 by a while-merge relationship by using an infer- ence rule, but discussing that is clearly beyond the limits of this paper. Other ways besides the at-lot case to recognize s-during relationship are, of course, the explicit connection of two events by a temporal conjunction, e.g. while the blood passes through the lungs, it picks up oxygen.

Descriptive relations, those introduced by descriptive verbs, e.g. the heart contains four chambers, are filtered and integrated into the object-based representation by the algorithms explained in [23].

11. Conclusions

The representation of knowledge based on the semantics of the is-a relationship has been one of the most powerful knowledge representation techniques and is, undoubtedly, the basis for knowledge organization and reasoning for object centered representations. In this paper, we have

introduced a representation technique based on the subevent relationship, which, unlike the is-a relationship, organizes events rather than objects, but that like the is-a relationship, also organizes the events in a hierarchical structure. We have also shown how event centered representations based on the subevent relationship can be integrated within object centered representations based on the is-a relationship and that the proposed representation provides a bridge between texts describing systems and deeper representations of such systems. Finally, we have also described the reasoning algorithms that the proposed representation permits and provided algorithms, analogous to the classifiers for object centered representation, for the construction of the subevent hierarchies. The algorithms described here are being applied to the acquisition of knowledge from encyclopedic texts [ 13].

Acknowledgements

This research has been funded in part by NASA-KSC Contract NAG-I-0058. This paper benefitted from discus- sions with Carlos Segami. I am also grateful to Donna Markey, Richard Hull and two anonymous referees for com- ments on earlier versions of this paper.

References

[1] B.M. Sundheim. Overview of the third message understanding evaluation and conference, in: Proceedinqs of the Third Message Understanding Conference (MUC-3), Morgan Kaufmann, Palo Alto, CA. 1991, pp. 3-16.

[2] B.M. Sundheim, Overview of the fourth message understanding evaluation and conference, in: Proceedings of the Fourth Message Understanding Conference (MUC-4). Morgan Kaufmann, Palo Alto. CA, 1992, pp. 3 21.

[3] J. Cowie, W. Lehnert, Information extraction, Communications of the ACM 39 (I) (1996) 80-91.

[4] S. Matwin, S. Szpakowics, Text analysis: how can machine learning help?, in: Proceedings of the First Conference of the Pacific Associa- tion for Computational Linguistics, Vancouver. 1993, pp. 33-42.

[5] B. Moulin, D. Rousseau, SACD: a system for acquiring knowledge from regulatory texts, Computers and Electrical Engineering 20 (2) (1994) 131-149.

[61 S. Delisle, Text Processing without a-priori knowledge: semi- automatic linguistic analysis for incremental knowledge acquisition, Ph.D thesis, Department of Computer Science, University of Ottawa, 1993.

[7] S. Szpakowicz, Semi-automatic acquisition of conceptual structures from technical texts, International Journal of Man-Machine Studies 33 (1990) 385-397.

[8] F. Gomez, C. Segami, Knowledge acquisition from natural language for expert systems based on classification problem-solving methods, Knowledge Acquisition 2 (1990) 107-128.

[9] J. Boose, Personal construct theory and the transfer of human exper- tise, Proceedings of the National Conference on Artificial Intelligence, Austin, TX, 1984, pp. 27-33.

[10] J. Boose, J. Bradshaw, Expertise transfer and complex problems: using AQUINAS as a knowledge acquisition workbench for expert systems, International Journal of Man-Machine Studies 26 (1) (1987) 3-28.

Page 15: A representation of complex events and processes for the acquisition of knowledge from texts

F. Gomez/Knowledge-Based Systems 10(1998)2.77 251 251

[I 1] R. Gaines, M.L.G. Shaw, Eliciting knowledge and transferring it effectively to a knowledge-based system, IEEE Transactions on Knowledge and Data Engineering 5 ( I ) (1993) 4-14.

[ 12] J. Batali, Automatic Acquisition and Use of Some of the Knowledge in Physics Texts, Doctoral Dissertation, MIT, 1991.

[13] F. Gomez, R. tlull, C. Segami, Acquiring knowledge from ency clopedic texts, in: Proceedings of the ACL's 4th Conference on Applied Natural Language Processing, Stuttgart, Germany, 1994. pp. 84-90.

[14] DARPA. Proceedings of the Tipster Test Program, Fredricksburg, VA, Morgan Kaufmann, Palo Alto, CA, 1993.

[I 5] F. Gomez, C. Segami, R. Hull, Determining prepositional attachment. prepositional meaning, verb meaning and thematic roles. Computa tional Intelligence 3 I l) (1997) 1-32.

[16] F. Gomez, C. Segami. Classification-based reasoning, IEEE Trans actions on Systems, Man and Cybernetics 21 (3) (1991) 644-659.

[I 7] K. Forbus, Qualitative physics: past, present and future, in: Exploring Artificial Intelligence, H. Shrobe (Ed.), Morgan Kaufmann Publishers, Palo Alto, CA, 1988.

[18] B. Chandrasekaran, Design problem solving, AI Magazine II (4) (1990) 59-71.

[191 A. Goel. B. Chandrasekaran. Functional representation of designs and redesign problem solving, in: Proceedings of the Eleventh Inter- national Joint Conference on Artificial Intelligence. 1989, Detroit, USA, pp. 1388 1394.

[20] J.F. Allen, Towards a general theory of action and time. Artilicia[ Intelligence 23 (1984) 123-154.

[21] D. McDermott. A temporal logic for reasoning aboul processes and plans, Cognitive Science 6 (1982) 101 155.

[22] J.F. Allen. Maintaining knowledge about time intervals. Ccmununica lions of the ACM 26 (1983) 832 843.

[23] F. Gomez. Acquiring intersenlential explanatory connections in expo- sitory texts, International Journal of Hunmn-Ctnnpuier Studies 44 (1996) 19-44.

[24] R. Brachman, J. Schmolze, An overview of the KL ONE knowledge representation system, Cognitive Science 9 il985) 171 216