AI Growing Up

8/8/2019 AI Growing Up

1/12

Th is Is a Publication of

Th e Am erican Association for Artificial In telligen ce

This electronic document has been retrieved from theAmerican Association for Artificial Intelligence

445 Burgess DriveMenlo Park, California 94025

(415) 328-3123(415) [email protected]

http://www.aaai.org

(For mem bership in formation,

consult our web page)

The material herein is copyrighted material. It may not bereproduced in any form by any electronic or

mechanical means (including photocopying, recording,or information storage and retrieval) without permission

in writing from AAAI.
http://www.aaai.org/home.htmlhttp://www.aaai.org/home.html


2/12

s Many people make many confusing claims about

the aims and potential for success of work in AI.

Much of this arises from a misunderstanding of

the nature of work in the field. In this article, I

examine the field and make some observations

that I think would help alleviate some of the prob-lems. I also argue that the field is at a major turn-

ing point, and that this will substantially change

the work done and the way it is evaluated.

AI has always been a strange field. Whereelse could you find a field where peoplewith no technical background feel com-

pletely comfortable making claims about via-bility and progress? We see articles in the pop-ular press and books regularly appear telling usthat AI is impossible, although it is not clearwhat the authors of these publications mean

by that claim. Other sources tell us that AI isjust around the corner or that its already withus. Unlike fields such as biology or physics,apparently you dont need any technicalexpertise in order to evaluate whats going onin this field.

But such problems are not limited to thegeneral public. Even within AI, the researchersthemselves have sometimes misjudged the dif-ficulty of problems and have oversold theprospects for short-term progress based on ini-tial results. As a result, they have set them-selves up for failure to meet those projections.Even more puzzling, they also downplay suc-cesses to the point where, if a project becomessuccessful, it almost defines itself out of thefield. An excellent example is the recent suc-cess of the chess-playing program DEEP BLUE,which beat the world chess champion in 1997.Many AI researchers have spent some effort todistance themselves from this success, claim-ing that the chess program has no intelligencein it, and hence it is not AI. I think this is sim-

ply wrong and will spend some time trying to

argue why.

So how can we explain this strange behav-

ior? I cant give a complete answer but can

point out some contributing causes. First,unlike most other sciences, everyone feels that

they are familiar with the object of study in AI,

namely, human intelligence. Because we aspeople are so complex, and so much of our

behavior is subconscious and automatic, we

have a hard time estimating the difficulty of

tasks that we do. Consequently, we have a hardtime accurately evaluating progress. For

instance, to us, processes such as seeing, lan-

guage understanding, and commonsense rea-

soning appear straightforward and obvious,while tasks such as doing mathematics or chess

playing seem difficult. In contrast, it is the per-

ceptual and commonsense reasoning tasks thatare most difficult for machines to capture,whereas more formal problems like game play-

ing are much more manageable. Thus, the gen-

eralizations that one might make by thinking

of machines as people will tend to be highlyinaccurate. For instance, one might think that

because a machine can play excellent chess, it

must be superintelligent, possibly more intelli-

gent than any human. In order to defendagainst this threat, people feel they have to

put down work. In contrast, if machines

become capable of some truly difficult tasksuch as understanding natural language, peo-

ple may very well take this in stride and focus

on how intelligent the system appears to be

given what it says.

This inability to fathom the true difficultiesinvolved is not limited to the general public.

As mentioned previously, researchers them-

selves can be equally wrong in their evalua-tions. In the researchers defense, the field is

very young. By analogy to a mature science

Articles

WINTER 1998 13

AI Growing UpThe Changes and Opportunities

James F. Allen

Copyright 1998, American Association for Artificial Intelligence. All rights reserved. 0738-4602-1998 / $2.00


3/12

wide range of areas over the last few decades,including theories of search, representation,

pattern matching and classification, and learn-ing and a host of applications, includingrobotics, natural language, and vision. In addi-

tion, weve certainly been governed more byemotion than rationality. Weve been prone tomaking outrageous claims; weve tended to fall

under the sway of the popular kids on theblock, with everybody rushing off and pursu-ing the latest popular ideas; and weve certain-

ly succumbed to bullies. Weve certainly expe-rienced a childhood.

Equally obvious, were far from adulthood.

We dont yet have a good sense of self in thatwe have no consensus on a definition of thefield, and we need to learn some responsibility

and self-control in that we have no generallyagreed-upon methodologies for AI research. Sothe critical question is whether weve moved

into adolescence yet, and if so, how far havewe developed? I would claim that were right

at the beginning of adolescence. We are at asimilar transition point to the first flight in avi-ation. The field of aviation was changed dra-

matically by the development of working pro-totypes because for the first time, experimentalwork could be supported. Until then, there was

no reasonable form of evaluation because noone could get off the ground. Consequently, itwas hard for the field to progress. After the first

flights, there was a tremendous surge ofprogress and development in aviation.

Given all this, I believe that were at a simi-

lar transition point to the first flight becausewe are now able to construct simple working

artifacts which then can be used to supportexperimental work. This is a critical event inthe development of the field that I believe willrevolutionize the way the field operates and

the way it is perceived. Now, some people willclaim that we have been building working sys-tems for decades in AI, so I need to qualify

what I mean by working. For much of AIs briefhistory, a system has been said to work if it

could run on a set of predefined or cannedscenarios. Thats not what I mean by working.Rather, I say a system works if we can specify a

domain that supports a range of tasks, and thesystem can handle randomly generated orselected tasks in that domain with a reasonable

degree of success. The domain and tasks maybe precisely specified, as in game playingwhere the rules completely define the possible

behaviors and outcomes, or more abstractlydefined, as in tasks to support problem solving,or only defined by naturally occurring behav-

iors, such as in visual interpretation or spokenlanguage understanding. In all cases, the sys-

such as physics, we are at a stage prior to thedevelopment of calculus and Newtons laws.Our youth should be evident from the fact that

many in the first generation of AI researchersare still active. We still havent produced a

good, clear explanation of what the field is,what its goals are, and what methodologies areappropriate in research. Im going to spend a

fair amount of my time today on some of thesethornier issues that define what the field is.

It might help to draw a parallel between the

development of AI and the stages of humandevelopment. I should warn you, however,that this is a highly idealized version of how

humans develop, and as youll see, its a littleoptimistic. In this naive model, there are threestages of development. First, we go through an

infancy and a childhood, where we developbasic skills. Theres a very strong emphasis oncreative exploration, and were governed more

by emotion than by reason. Then we moveinto the second stage, adolescence, where we

develop a sense of self as well as self-relianceand discipline. We acquire the skills and habitsthat are going to last us a lifetime. Finally, we

attain adulthood, where we apply and furtherdevelop our skills with practice. We trainfuture generations and ultimately attain a very

deep understanding of life.So, how does this apply to scientific disci-

plines? Lets consider how this model applies

to the development of modern aviation basedon fixed-wing aircraft. The infancy of aviationwas a long period of time, several thousand

years, where people experimented with bal-loons and artificial wings and hopping

machines, and they jumped off cliffs and didmany other things in an attempt to fly. Adoles-cence began with the development engineslight enough to be able to lift themselves off

the ground. These first prototypes also got thedevelopment of aircraft off the ground!Although theKitty Hawk wasnt a viable air-

plane and stayed in the air only for a few sec-onds, it actually was the first time in fixed-

wing aviation that there was an experimentthat achieved the goal even partially. Fromthen on, researchers could measure incremen-

tal progress, and a burst of work shortly led tothe first viable airplanes. The field eventuallymoved on to adulthood with the development

of the science of aeronautics and a deep under-standing of flight that is used today for a widerange of applications.

So, at what stage is AI? Well, weve certainlybeen having a childhood, and in fact, we maystill be in it. We can see the typical properties

of childhood such as learning of basic capabil-ities. Weve made tremendous progress in a

Weve madetremendous

progress in awide range ofareas over the

last few

decades,includingtheories of

search,represen-

tation,pattern

matching andclassification,and learning

and a host ofapplications,

includingrobotics,natural

language,and vision.

Articles

14 AI MAGAZINE


4/12

tem works if it can perform its tasks successful-ly over a wide range of specific situations andproblems within the domain.

Depending on the area, we have differentdegrees of sophistication in working systems.

In well-circumscribed domains, such as gameplaying, we have high-performance workingsystems. Most newsworthy lately, of course, wenow have a chess-playing program that canbeat the world champion. Even in more gener-al domains, we can see substantial progress.Expert systems are widely used in industry todo things such as controlling factories and

diagnosing faults and problems in mechanicaldevices, although such applications are oftennot publicized as AI. We have robots crawlingon Mars, and spacecraft that will be controlledusing AI planning and agent technology. Wehave systems that can roughly classify the con-tent of news articles and e-mail messages and

conversational agents that can carry on simpleconversations with people using speech. Some

of these systems are quite primitive comparedto what we would like, but theyre all workingby the definition I gave earlier. You can givethem new scenarios and have them gothrough and actually do something. With the

capability to evaluate, we can start the longprocess of incremental progress that is essen-tial for the development of the science.

Given the progress mentioned previously,why does AI have such a bad name in certaincircles, and why do many still think AI isimpossible or that the accomplishments so farare not really AI work? A large part of the prob-

lem is that the goals of the field are misunder-

stood, not only by the layperson but, evenmore damaging, by the researchers themselves.As a result, it can seem that AI is pursuing ahopeless dream and that all these success sto-ries are not really the product of AI research. Itis critical that we reexamine the goals of the

field and to attempt to define it in a way thatis inclusivea definition that embraces thediversity of work and does not exclude largesections of the field.

What Is AI?

I looked in the introductions of five introduc-tory textbooks to AI and collected the defini-tions that were presented. I was surprised atthe range of different definitions and the lackof an overall consensus. But one thing they all

had in common was that all the definitionsfocused on what the artificial part was, butintelligence was left undefined. In fact, mostdefinitions of AI used the word intelligence inits definition, leaving that term completely

unexamined. I think this causes considerablesocial problems in the field and so will attempt

to elaborate on the notion of intelligence andin what sense it can be artificial.

Let me start with definitions that should not

be used to define the field of AI because theyare not broad enough and encourage exclusionof work rather than inclusion. AI is not the sci-

ence of building artificial people. Its not thescience of understanding human intelligence.Its not even the science of trying to build arti-

facts that can imitate human behavior wellenough to fool someone that the machine ishuman, as proposed in the famous Turing test.

These are all fine motivations of individualresearchers, but they dont define the field.Specifically, they are not inclusive enough to

include the breadth of work in the field, andthey encourage fragmentation.

These definitions also lead to behavior

where success is downplayed because it fallsshort of these definitions of AI. To illustrate

this, I want to give my favorite ways of puttingdown or excluding AI research. Most recently,Ive heard variants of all of these in response to

the success in chess playing. The first attackcan be paraphrased as, Well, the programonly does one thing (say, play chess), and its

not aware of the larger context. Therefore itsnot intelligent. The underlying assumptionmade here that intelligence means that a sys-

tem has to duplicate a person, and since peo-ple are obviously aware of much larger contextand do many things, anything that cant do

that is not intelligent. Note that this is anabsolute statement. Such people are not saying

that the system is not very intelligent com-pared to human abilities; rather, they areclaiming flat out that it is not intelligent. Infact, many go further and use this argument to

claim that current chess-playing programsdont have anything to do with AI research.The second way is really a variant on this

theme and can be paraphrased as Well, peo-ple dont work that way, so this is not intelli-

gent. A quite different reason, and my all-time favorite can be paraphrased as Well, Iunderstand how that system works, so it cant

be intelligent. It seems that there has to besomething mystical about intelligence thatforces us to say that if we can understand the

mechanism, then it lacks intelligence. If some-thing is just a mechanism and deterministic,then it cant be intelligent. These three meth-

ods of excluding work from the field are allmotivated by the definitions mentioned previ-ously.

So, if thats what AI is not, what is AI? Letsstart with this mystical word intelligence and try

In fact, mostdefinitions of

AI usedthe wordintelligencein itsdefinition,leaving

that termcompletelyunexamined.

I think thiscausesconsiderablesocial

problems inthe field and

so willattempt toelaborate onthe notion ofintelligenceand in whatsense it canbe artificial.

Articles

WINTER 1998 15


5/12

gence. Theres creative intelligence, whichreflects ones ability to be innovative and pro-

duce original ideas. Theres communicativeintelligence reflecting our communicationskills, physical intelligence reflecting our agility

and skill at sports as well as others. While theo-ries differ on the number of types of intelli-gence, it is clear from this literature that there is

no single notion of intelligence. Rather, thenotion is being broken down into particularcapabilities.

Next we can try a linguistic analysis of theterm. The word intelligence is a nominalizationof the adjective intelligent. It is an interesting

fact that most properties described by adjec-tives are relative to the thing youre modifying.For example, even the quite concrete adjective

large means something different when talkingabout a mouse and when talking about an ele-phant. The same thing happens with intelli-gent. We can argue about whether my dogssmarter than your dog by saying things like,

My dog knows five tricks; yours only knowsthree. My dog knows 20 words of English andcan do all sorts of things, and yours doesnt

know hardly anything! While you may dis-pute the facts of the case, we generally agreeon what properties we are talking about. But

you see a very different argument if a cat own-er is arguing with a dog owner about which petis more intelligent. The dog person will say,

My dog does more tricks, and the cat ownerreplies, Well, why would you want a pet to dotricks? My cats much more independent. The

argument rapidly changes form and becomes adebate about what intelligence is. And theres

never a resolution to this because people areusing different criteria when applying thenotion to dogs than to cats. So, intelligence isa relative concept that may capture different

properties depending on the thing it modifies.So, now for some empirical work. Given

intelligence is relative, what kind of things can

be said to be intelligent? Clearly we apply theterm to people, and earlier, I illustrated the case

of applying the term to cats and dogs. In gener-al, I think we are comfortable using the term forall mammals and probably other vertebrates.

For instance, we could talk about one iguanabeing smarter than another iguana because ofthe way it interacts with its environment. Note

that theres no magic property of intelligencethat all these animals have; rather, the notionmay change from species to species. But we are

comfortable comparing the intelligence of ani-mals of like species. Does the term applybeyond the animal kingdom. I am a little

uncertain with bacteria. Initially, I thought not,but on second thought, I imagine some micro-

to tease out a few properties of it. Then welldeal with the artificial part and see if we cancome up with a better definition of the field.

Readers familiar with work in knowledgerepresentation or philosophy or linguistics will

realize that we are unlikely to produce a precisedefinition ofintelligence. They know that mostwords that describe properties or categories of

things in English or any other natural lan-guage inherently lack precise definitions. Forinstance, consider trying to define the notion

of a chair. We might try to define a chair by itsphysical characteristics; say, it must have somelegs and a back and a place that you sit. But

counterexamples are easy to findthere arechairs that dont have legs because they consistof solid blocks. In addition, you can find chairs

that only have little backs or no back at all.And Im sure someone has made a chair out ofstone. Theres a sliding scale between what a

chair is and what is just a big stone. So it seemsimpossible to define the notion of a chair in

terms of its physical characteristics. You mighttry to define a chair in terms of functionalcharacteristics. We might want to define a

chair as something that we sit on. Of course,this doesnt exclude large rocks that we sit on.But what about a stone that someone intends

to use as a chair but has never yet been sat on?And if a chair seats more than one person, isntit a sofa, not a chair? So does a chair stop being

a chair if we can squeeze two people into it?Again, we have a continuum of functional def-initions between chairs and sofas.

While theres no precise defining criteria,were all perfectly comfortable with the notion

of a chair. This is the kind of concept we dealwith all the time, concepts that are given theterm natural kinds by philosophers. Given thatwe cant define the notion of a chair precisely,

we cant expect a precise definition of a morecomplex thing such as intelligence. But we canlearn some properties of it and maybe develop

some common intuitions about intelligencethat will help us define AI.

One way to explore the notion of intelligenceis to look at work by people who have studied itin depth, for instance, in the education litera-

ture. And what we see there is the suggestionthat there really is no single form of intelli-gence. Rather there are many different forms of

intelligence, or different intelligences, that peo-ple have. Theres analytical intelligence, whichconsists of the skills that are measured by typi-

cal intelligence tests. While some people havetaken that as the definition of intelligence,many feel that it captures only one particular

skill that people may have. But there are manyother aspects to intelligence or types of intelli-

The onlythings I can

think of thatarent able to

haveintelligence

in some senseof the wordare inert

things likerocks. I have

not been ableto think of a

possiblescenario

where

someonecould

convince methat one rocks

smarter thananother one.

Articles

16 AI MAGAZINE


6/12

biologist could convince me that some bacteriaare more intelligent than other bacteria. Again,

though, note that the notion of intelligence asapplied to bacteria may have nothing to dowith intelligence as applied to humans.

What about machines? The advertisingindustry is absolutely positive that we haveintelligent machines. We have intelligent cars,

intelligent microwaves, all sorts of differentmachines have some notion of intelligence.And while Id never depend on the advertising

industry to define anything accurately, thisdoes indicate that people generally feel com-fortable applying the notion of intelligence to

machines. Certain machines will do some-thing better than other ones and might adaptbetter and, thus, are more intelligent at the

functions they perform. While some AIresearchers and philosophers might think itoutrageous to claim that a thermostat can be

intelligent, I doubt that most other peoplehave difficulty with the concept. Clearly, one

thermostat can be better at sensing and con-trolling the environment than another, andwe are comfortable at characterizing this dif-

ference as a difference in intelligence.The only things I can think of that arent

able to have intelligence in some sense of the

word are inert things like rocks. I have notbeen able to think of a possible scenario wheresomeone could convince me that one rocks

smarter than another one.So what can we learn from this exploration?

We have developed a notion of species-specif-

ic, or more interestingly, task-specific intelli-gence, which is a measure of how well some

entity can perform a set of tasks. Looking atthe previous analysis, a prerequisite for some-thing to be intelligent is that it has some wayof sensing the environment and then selecting

and performing actions. Note that everythingdiscussed earlier except rocks can do this. Touse a very popular term these days, these are all

things that we might call agents.So intelligence is a term applied to agents

that characterizes the degree to which they aresuccessful at performing certain tasks. Theterm is relative both to the agent and to the

task youre talking about. Notice that this def-inition doesnt require, nor does it prohibit, asymbolic representation of the world, or acting

like a human, or anything else. Note also thatit means something can have intelligence evenif we understand how it works.

Given some intuition about the notion ofintelligence, we can now consider the artificialpart of AI. In what sense is the intelligence we

create in a machine artificial? Given the previ-ous discussion, it seems that intelligence as

applied to machines to compare differentmachines is real intelligence; namely, it com-pares how well they perceive the environmentto perform certain tasks. I think that the artifi-cial part of the phrase comes from the fact thatwe try to get machines to display intelligence

that we would normally only apply to people.So it is this switching of senses that makes itartificial. To try an analogy, we might try todevelop cat AI by training a dog to behave likea cat. We would then evaluate how good a mod-el we would have for the dog using the mea-sures we would usually use for cat intelligence.

Pulling this all together, heres my attemptat a one-sentence definition of AI: AI is thescience of making machines do tasks thathumans can do or try to do. Note that Ihavent used the term intelligence in the defin-ition. Instead, Ive used a slightly less mysticalword task, which implies the requirement ofacting in response to an environment to

achieve goals. Note that this definition also is

quite inclusiveyou could argue in a sensethat much of computer science and engineer-ing could be included in this definition. Forinstance, adding numbers is a task that peopledo, and so you could argue that building amachine that can add numbers up is AI. And

actually, I think thats probably right. I thinkthe inventors of the first adding machineswould have seen themselves as the precomput-er-era AI pioneers. But in current times weunderstand very well how to build machinesthat add numbers, so its not particularly aninteresting issue with regard to intelligence,and the field focuses on the more complex

things that people do.

Developing Good Work Habits

With a working definition of the field in hand,the next thing to worry about is developinggood work habits, namely, our methodologyand modes of research. To start this discussion,I want to describe some of the prime mistakesthat we have made in the past. Ill try to do this

without insulting specific people and wouldinclude myself as one of the offenders as well.After pointing out the mistakes, I want to turnit around and argue that we probably couldnthave done better previously, given what weknew at the time (a principle characteristic ofthe infancy of the field). But we can do betternow.

So here are some mistakes. Each of these hasa perfectly reasonable set of starting assump-tions but a bad ending result.

The microworld mistake: In order to studya problem, you design a highly simplified

Pulling thisall together,

heres myattempt at aone-sentencedefinition of

AI: AI is thescience ofmakingmachines dotasks that

humans cando or try todo.

Articles

WINTER 1998 17


7/12

theories, we wouldnt have been able to makethem run.

But things have changed now, and we havethe ability and opportunity to do better. Theopportunity is that we now have a wide range

of concepts, techniques, and tools to supportintelligent behavior. And in addition to that,we have achieved some basic competence in

supporting real-time perception and motorskills. Consider the power of some of the tech-niques that we have developed: game playing,

where we have built the world chess champi-on; scheduling, where we have fast, heuristicscheduling algorithms that yield dramatic

speedups over traditional methods; decisionmaking, where we have expert systems as stan-dard tools in many companies and products;

and financial forecasting, where we dont hearmuch about what people are doing, but WallStreet firms seem to hire AI researchers at a

rapid rate.On the perception side, robots with vision

are revolutionizing manufacturing. In addi-tion we have a robot crawling over Mars, andinexpensive robots are readily available that

are on the retail market for a few hundred dol-lars. We have viable commercial speech-recog-nition systems now, supporting over 30,000-

word continuous speech, with 95-percentaccuracy after training on a single speaker. Thisis a very impressive performance compared to

where we were a few years ago and goodenough to support many applications.

Given that we have basic perceptual capabil-

ities and we have a wide range of higher-levelreasoning techniques, how are we to proceed?

First, we must identify the particular intelli-gence we want to study. It doesnt make senseto try to capture all forms of human intelligencesimultaneously. Rather we must study more

mundane and limited tasks (or intelligences).For example, we might want to build a house-cleaning robot or an assistant for logistics plan-

ning or a telephone-based automatic customer-service agent. Each one of these tasks defines a

particular capability or task that could supportlong-range research efforts. For the selectedtask, we then define a set of telescoping restrict-

ed domains, all the way from the long-termdream down to some absolutely trivial.

The next step after defining the set of tele-

scoping tasks is to define the rules of evaluation.Ideally, the evaluation measures should applyover the full range of tasks that are possible,

from the trivial to the most complex. It is anadded bonus if the evaluation criteria can alsobe applied to people doing the same task, for

then we can compare system performance withhuman performance on the identical tasks.

microworld that captures part of the problem.You then develop and test systems that can dotasks in this microworld. This is all very fine asthe start of the research project. There are twomistakes that follow. One involves generaliz-ing from this initial work to make claims aboutthe original problem without ever testingbeyond the microworld. The other involvesforgetting about the original problem com-pletely and basing all future work on the sim-plified microworld.

The abstraction mistake: A related but dif-ferent problem is the abstraction mistake. Herewe identify a few properties of a real task andproduce mathematical abstractions of theseproperties. Again, both of those steps are per-fectly good as initial exploration. But then wework with the mathematical abstractions andnever come back to the issues that came up inthe original task. In some cases, new subfieldsof research have arisen based solely on this lev-el of abstraction, and work becomes farther

and farther removed from the original moti-vating problems.The magic bullet mistake: The third class

of mistakes covers a wide range of issues I callthe magic bullet mistakes. These all use the sameform of argument: We just need to do thisone thingX, and then intelligence is just goingto happen. So we dont have to worry aboutintelligence per se, were just going to work onX. Again, typicallyXis a reasonable thing towork on in its own right; the problem is in theclaims that are made that cannot be evaluateduntil the indefinite future. Here are some clas-sic magic bullets:

If we get the right learning algorithm,intelligence is just going to emerge. So allwe need to do is study learning.

If we work on basic mechanisms, such asmotor control and sensing, intelligencewill just emerge.

If we encode enough commonsense factsabout the world, intelligence is just goingto emerge.

As I said before, all these are reasonablethings to be doing, in context, but theyre notgoing to suddenly solve the problem of artificial

intelligence. And they shouldnt be sold as that.From a distance, its easy to laugh at every-one who has made these mistakes, whichincludes most researchers in the field. But infact, in our defense, remember we were in ourchildhood. We were allowed to make mistakesthen; thats part of the evolution of the sci-ence. We started off with no workable theoriesand learned a lot. And given the power of com-puters until recently, even if we had had great

Articles

18 AI MAGAZINE


8/12

Now we are ready to develop a model andsystem for the very simplest task domain.Since the initial task should be highly simpli-fied, we should be able to accomplish this in arelatively short time period and establish somebaseline of feasibility for this line of research.We then evaluate the model on the simple taskdomain. Once we have reasonable perfor-mance, we then evaluate the model on the taskdomain the next level of complexity up.

By focusing on a simple task first, we canground the research and evaluate capabilitiesearly on. Evaluating on the next level of com-plexity up would have a nice sobering effect onthe claims that we would then make abouthow general our systems are. It would alsoidentify the next set of critical problems forthe research project. This approach still allowsthe use of microworlds, abstraction, or magicbullets for preliminary research, but it forces usto continually ground the research back to thetask as we go along.

To reiterate the strategy, we study intelli-gence with respect to a task, and we simplyturn real problems into manageable ones byrestricting the task rather than the abilities ofthe machine.

An Example:Conversational Agents

Let me give a concrete example of this methodfor research based on our own work at the Uni-versity of Rochester on conversational agents.Well start with the dream and then refine thatdown into more and more simple domains.Then well establish the rules of evaluationthat were going to use throughout this, andconsider what initial steps weve taken.

The dream is a fully conversational, multi-media system that converses with a humanlevel of competence on arbitrary topics, essen-tially a machine you could just sit down andtalk to. Examples in the popular media wouldinclude HAL in 2001: A Space Odysseyand thecomputer in the Star Trek series. Obviously,were not going to achieve this level of compe-tence in the near future. But we can start refin-ing it down, as shown in figure 1. We can, for

instance, consider conversational agents thatcollaborate with people to solve concrete prob-lem-solving tasks. This includes most forms ofinteraction that we would probably want acomputer to engage in: diagnosis, design,planning, scheduling, information retrieval,command and control, and a wide range ofother task-directed activities. This restrictedtask is still too general for immediate solution,so we refine it again to concentrate only on

transportation planning and scheduling tasks.Once we get down to a fairly concrete level,

we can start parameterizing domains alongother dimensions. So, with planning andscheduling, for instance, we can parameterize

the size of solutions required, the number ofactions in a solution; the number of different

action types that you need to reason about andtheir complexity, the complexity of temporalinteractions in the domain, reasoning about

quantities, static worlds versus dynamic worldsthat change as you go along, and so on. There

are many different parameters, and it is impor-tant to define the different dimensions of thedomain so that we have a rich understanding

of the problems that are there and those wereabstracting away as we simplify the tasks.

Note that this is a very different way of

breaking up the problem than one might takein defining a conversational agent. We might,

for instance, have tried to restrict tasks by lim-iting the vocabulary, the range of languagethat can be understood, the range of conversa-

tional acts handled, or the level of complexitywe can handle in the discourse. The problem

with simplifying in this way is that it does notsimplify the problem for the person who mustuse the system. Rather it complicates things as

they must learn a set of rules that restrict theirnatural behavior. So the computers task

becomes simpler, but the humans taskbecomes more complicated. Our approach isnot to limit the person except by having them

focus on solving the task. That way, the com-plexity of the task they have to solve naturally

Articles

WINTER 1998 19

Discussing Unrestricted Topics

Arbitrary Problem Solving Tasks

Planning & Scheduling Tasks

Figure 1. Telescoping Tasks by Restricting Topics for Conversational Agents.


9/12

Since the goalis a conversa-

tional agentthat

collaborateswith the useron planning

andscheduling

tasks, anatural

evaluationcriterion is

how well the

systemenhances

humanperformance

on the task.

videotapes and example sessions for papers,and so on. In the past, this would have been

enough to warrant claiming success. But giventhat we had achieved some basic competence,new questions arose that we couldnt ask

before: How well does it work? What factorscontribute to its success? We couldnt ask suchquestions before because we didnt have a con-

versational system that worked even badly(that is, drawing a metaphor back to flight,wed never have gotten a conversational sys-

tem off the ground).To better answer how well it does work, we

performed an experiment. We recruited some

undergraduates, showed them the two- andone-halfminute videotape, and let them prac-tice on the speech recognizer and then gave

them cue cards with the problems to solve onit. We measured how long each session took,whether it ended in a working plan, and we

measured the quality of the solution by con-sidering how long the plan would take to exe-

cute. Seventy-three out of 80 sessions resultedin successful plans, which we considered to bean excellent success rate. However, its also

important to realize that TRAINS didnt reallyimprove on a person doing the same task. Infact, what it really shows is that people can

solve simple tasks under duress and eventuallycoach a reluctant system to get through andget to the solution. So we hadnt succeeded

based on our prime evaluation criteria ofimproving human performance. But we didestablish the basic feasibility of the approach.

Having a complete system running alsoallowed us to do other experiments that can

lead to incremental improvements as well asexplore other interesting issues. For example,we were interested in how well the robust lan-

guage processing worked. To explore this, wecompared using the system with speech input,which had about a 30-percent error rate in this

experiment and with keyboard input, whichhad about a 3- to 4-percent error rate due totypos. Interestingly enough, even with the

high error rate, the speech-driven sessions tookonly 66 percent of the time of the keyboardwhile producing the same quality of plans.

And when given the choice, 12 out of 16 peo-ple chose speech to interact with the systemrather than with keyboard. Im not saying that

speech is better than the keyboard in generalfor human-computer interaction. We wouldneed to do much more elaborate experiments

with alternative interfaces to establish some-thing like that. But it is true that in the TRAINStask, robustness of the language processing

allowed speech to be a viable, in fact, a pre-ferred mode, of interaction.

limits the complexity of the interactions thatthey typically attempt.

Since the goal is a conversational agent thatcollaborates with the user on planning andscheduling tasks, a natural evaluation criterion

is how well the system enhances human per-formance on the task. This criterion can beapplied in any of the different tasks, whatever

level of complexity, and it provides a way todetermine whether were successful or not. Inaddition, we can use it to compare with

human performance on the same tasks and,say, compare a person working with the systemversus working alone, or compare it to two

people working together.In addition, because this project is studying

conversational agents, an additional restric-

tion is introduced that spoken language is aprimary mode of communication. Note thatthis is an assumption not directly motivated

by the evaluation criteria. If the goal were sole-ly to improve human performance in planning

and scheduling tasks, we might need to evalu-ate whether language is the best way to do it.By stipulating this restriction at the start, how-

ever, we focus the research. Additionally, werequire that no restrictions be placed on whatthe user may say and that the system should

function as well with novices or experts. As aconsequence, we allow only minimal training,a few minutes at most, before a person is able

to use the system.Given this, in 1995 we explored the funda-

mental feasibility of this line of work. To do

that, we defined the simplest domain we couldcome up with that still required extended

interaction and constructed a system to handleit, TRAINS-95. The TRAINS-95 domain required aperson to interact with the system to construct

efficient routes for trains given a route map(see an example in figure 2). To complicatematters, in each session, problems such as bad

weather were randomly generated in the sce-nario that only the system initially knewabout. In addition, trains would be delayed if

they attempted to move along the same tracksegments at the same time. Much of the effortin TRAINS-95 was aimed at developing tech-

niques for the robust understanding of lan-guage in context in the face of speech-recogni-tion errors, and we could get away with quite

simple dialogue models. Readers interested inseeing a video of the session can obtain a copyfrom the web site at www.cs.rochester.edu/

research/trains.We had a few friends try TRAINS and found

that they often could successfully solve prob-

lems using the system. In other words, itseemed to work, and we could generate good

Articles

20 AI MAGAZINE


10/12

With basic feasibility established on the

base-level task, we were ready to move on tothe next stage of the research. Our goal was to

define a task that was still simple but one

where we hoped that we could actually showthat a person with the system can improve on

human performance; that is, it has got just

enough complexity. The result was a new taskof moving cargo and personnel around from

different locations under time and cost con-

straints, with multiple modes of transporta-tion. Theres increased opportunity for com-

plex interactions because you can move

things, drop them off somewhere, and have

something else pick them up. Options can be

compared based on cost and time. In addition,the world might change during planning, so

we might have a plan almost constructed and

then new information comes in that changes

the situation. The need to handle a more com-

plex domain forced us to handle much richer

language phenomena that arose because of the

more complex task.

We have just completed the first experimen-

Articles

WINTER 1998 21

Albany

Atlanta

Baltimore

Buffalo

Burlington

Boston

Charleston

Charlotte

Chicago

Cincinnati

Cleveland

Columbus

Detroit

Indianapolis

Lexington

Milwaukee

Montreal

New York

PhiladelphiaPittsburgh

Raleigh

Richmond

Scranton

Syracuse

Toledo

Toronto

Washington

Figure 2. A Map Used in the TRAINS System.


11/12

Questions

There are a few questions that I believe thatpeople will ask about what Ive been saying.

Isnt Building Systems anAwful Lot of Work?

The answer is, yes, but so is building bridges.AI wont develop without systems, just as civilengineering would not exist unless someonebuilt bridges. This doesnt mean that every-body has to build systems. In civil engineering,since bridge construction is well understood,the field can progress using controlled experi-mentation and theoretical analysis. But its stillultimately grounded back to actual bridge con-struction. And thats where everything ulti-mately gets tested.

Are You Saying That the Last 30 Yearsof Effort Have Been Wasted in AI?

Absolutely not! In fact, if you look at the TRAINSand TRIPS systems, they build on a long history

tal version ofTRIPS (the Rochester interactiveplanning system). A typical problem in TRIPSinvolves an island, and the task is to evacuateall the people off the island using some avail-able vehicles before a storm hits. By varyingthe number of people, the types of transportsavailable, and the time before the storm hits,we can generate a wide range of problems ofvarying difficulty. Figure 3 shows a screen shotfrom a session in which a plan is partiallydeveloped. Currently, TRIPS runs reliably onlyon scripted sessions where we have planned inadvance the interactions that will occur. How-ever, everything the system does is real. Afterevaluating TRIPS, we will then continue thisprocess, improving the system, generalizingthe tasks and domains, and so on. Along theway, new research problems arise that wewould not have thought of otherwise, and theevolving system serves as an excellent test bedfor supporting a wide range of research in col-laborative planning, interactive dialogue, andspoken-language understanding.

Articles

22 AI MAGAZINE

Figure 3. Developing a Plan with the TRIPS System.


12/12

of AI research, ranging from the seventies,with ideas in chart parsing, STRIPS-style plan-ning, hierarchical planning, and constraintsatisfaction; the eighties, with unification-based grammars, incremental semantic inter-pretation, hierarchical discourse models, tem-poral reasoning, and plan-recognitionalgorithms; and the nineties, with statisticallanguage models, robust parsing, and tempo-rally based planning. And there are many oth-ers. If this work hadnt been done, we wouldnot have the system today. So this really is anincremental development rather than a radicalchange from what weve done so far.

Well, System Building Is AppliedWork. What Im Interested in Is theScience, the Theory of AI.

System building is the experimental founda-tion on which the science of AI is based. If wedont have the experimentation, we donthave the science.

Well, How Can Work Progress,Then, Except for the FewWell-Funded Institutions?

Not everybody has to build full systems. But Iwould point out that if you work in a subareaof AI where nobodys building systems to testtheories that are being developed, you mightconsider what your foundations are. But onceyou have the grounding in some experimentalsystems, then each subarea can define otherrelevant methods of evaluation.

And, for example, in the natural languagefield, there are many different methods now

that allow people to do experimentation with-out building complete systems. There are large,shared databases of annotated corpora, annotat-ed with features that have a communitywideagreement about what is crucial for the lan-guage-understanding task. Theres agreement onsome subproblems that are crucial to the entireprocess, such as extracting out sentential struc-ture. And while it would be nice to see more,there is a beginning of sharing of system compo-nents, such as speech recognizers, parsers, andother components, where people can actuallybuild systems using other peoples work. Forinstance, in the TRAINS and TRIPS systems, thespeech-recognition technology was built atCarnegie Mellon. If wed had to build that our-selves, we would never have gotten it done.

Parting Thoughts

AI is a young field and faces many complexi-ties because of the intimate relationshipbetween the object of study (namely, intelli-

gent behavior) and peoples intuitions abouttheir own intelligence. Despite a growingrecord of success in the field, many discountthis progress by defining the successful workout of the field. Much of this is a result of thelack of an inclusive definition of the field. Ihave proposed a new definition and haveargued that we are at a critical point in thedevelopment of the field because we can nowconstruct simple prototype working systems.This new capability opens the door to newmethodologies for evaluating work in the field,which I predict will lead to an accelerating rateof progress and development.

Acknowledgments

This article is an edited transcript of mykeynote address given in Providence, RhodeIsland, at the Fourteenth National Conferenceon Artificial Intelligence in 1997. I want tothank everyone in the Computer ScienceDepartment at the University of Rochesterover the years for providing a supportive envi-ronment that has encouraged and nurturedmy work. The work at Rochester reported herewas done by the many dedicated students andresearchers in the TRAINS project. References totheir work can be found on the TRAINS webpage, www.cs.rochester.edu/research/trains.Thanks to Henry Kautz and George Fergusonfor helpful comments on a draft of this article.I also want to thank the Defense AdvancedResearch Projects Agency, the National ScienceFoundation, the Office of Naval Research, andRome Laboratories for supporting my workfairly regularly over the years.

James Allen is the JohnDessauer Professor of Comput-er Science at the University ofRochester. He received hisPh.D. in computer science atthe University of Toronto andwas a recipient of the Presiden-tial Young Investigator Award

from the National Science Foundation in 1984.A fellow of the American Association for Arti-ficial Intelligence, he was editor in chief ofComputational Linguistics from 1983 to 1993.He is also the author of several books, includ-

ing Natural Language Understanding, SecondEdition (Benjamin Cummings, 1995). Hisresearch interests concern the connectionbetween natural language and commonsensereasoning, including work on spoken dialogue,language understanding, knowledge represen-tation, planning, and reasoning. His e-mailaddress is [email protected].

Articles

WINTER 1998 23

AI Growing Up

Documents

Transcript of AI Growing Up