The Doctor is in… How does it work?jdb/ai/wk2-2006.pdf · project came from feeding the phrase...
Transcript of The Doctor is in… How does it work?jdb/ai/wk2-2006.pdf · project came from feeding the phrase...
1
RIT Computer Science Dept.
ELIZA
A program by Joseph Weizembaum in the 1960’s that emulated a Rogerian psychologist in dialogue with a patient
– Led at least one psychologist to seriously suggest that computers might help alleviate the shortage of trained psychotherapists
– Weizembaum later questioned his own work and became one of the field’s biggest critics
RIT Computer Science Dept.
ELIZA is…
A reactive agent (a stimulus response agent)An example of a simple production system (we will discuss production systems later)
RIT Computer Science Dept.
The Doctor is in…
[example of dialog with the doctor program in emacs]
RIT Computer Science Dept.
How does it work?
See Weizembaum’s paper at: http://i5.nyu.edu/~mm64/x52.9265/january1966.html
Slides come from information and text in the paper.
“Input sentences are analyzed on the basis of decomposition rules which are triggered by key words appearing in the input text. Responses are generated by reassembly rules associated with selected decomposition rules.”
RIT Computer Science Dept.
The technical problems
1. the identification of keywords2. the discovery of minimal context3. the choice of appropriate
transformations4. generation of responses in the
absence of keywords5. the provision of an ending capacity for
ELIZA "scripts"
RIT Computer Science Dept.
ID of Keywords
Keywordsmay have a RANK or precedence number. The procedure is sensitive to such numbers.When a keyword has been found and there is a delimiter (period or comma), all subsequent text is deleted from the input message. The keywords and their transformation rules are the SCRIPT for the conversation class. Scripts are independent of the program, so the same program can handle English, German, and other languages.
2
RIT Computer Science Dept.
Other Issues1. The ID of the "most important" keyword2. The ID of some minimal context within which the
chosen keyword appears; e.g., if the keyword is "you", is it followed by the word "are" (in which case an assertion is probably being made).
3. The choice of an appropriate transformation rule, and the transformation itself.
4. To respond "intelligently" when the input text contains no keywords.
5. The provision of machinery that facilitates editing, particularly extension, of the script on the script writing level
RIT Computer Science Dept.
An Example TransformationConsider the sentence:
– "I am very unhappy these days".
Consider somebody that only understood “I am” and responded:
– "How long have you been very unhappy these days?"
This person must have applied a template!He must also have a reassembly kit:
– "I am BLAH" can be transformed to
– "How long have you been BLAH" independently of the meaning of BLAH.
RIT Computer Science Dept.
A more complicatedexample
The sentence:– "It seems that you hate me".
The words "you" and "me“ are understoodA template decomposes the sentence into 4 parts:
1) It seems that2) you3) hate4) me
The reassembly rule might then be: – "What makes you think I hate you"
RIT Computer Science Dept.
What are the problems?
How can we make the Eliza program fail?
RIT Computer Science Dept.
Automated Translation
Failed miserably for years and is still very difficult
⇒One of the famous mistakes from an early DOD project came from feeding the phrase “the spirit is willing but the flesh is weak” into an English to Russian translator, then feeding the result back into a Russian to English translator to yield “the vodka is strong but the meat is rotten”.
RIT Computer Science Dept.
Knowledge Representation and Search
If a computer only has the right knowledge, representation of that knowledge, and way of indexing that knowledge then it can be intelligent
Does this idea work?
3
RIT Computer Science Dept.
Physical Symbol Hypothesis
Intelligent activity in either human or machine requires [Newell & Simon] :
– Symbol patterns to represent significant aspects of a problem domain
– Operations on these patterns to generate potential solutions to problems
– Search to select a solution from among these possibilities
RIT Computer Science Dept.
Representation Schemes
Schemes should be [Luger, pg. 36] :Expressive: The scheme must be adequate to express all necessary informationEfficient: Support efficient execution of the resulting codeNatural: Provide a natural scheme for expressing the required knowledge
RIT Computer Science Dept.
Representation Schemes
Where a good representation is available, the solution to the problem may be easy.Sometimes it’s hard to have a good representation – just imagine a representation for common sense knowledge in a natural language system!
RIT Computer Science Dept.
An exampleTask Description
To write a program that finds, for a given phone number, all possible encodings by words, and prints them. A phone number is an arbitrary(!) string of dashes - , slashes / and digits. The dashes and slashes will not be encoded. The words are taken from a dictionary which is given as an alphabetically sorted ASCII file (one word per line).
Participants: 14 programmers (ave. experience: ~ 7 yr)Biggest experimental flaw: subjects self selected
RIT Computer Science Dept.
Mapping
The following mapping from letters to digits is given:
E | J N Q | R W X | D S Y | F T | A M | e | j n q | r w x | d s y | f t | a m | 0 | 1 | 2 | 3 | 4 | 5 |
C I V | B K U | L O P | G H Zc i v | b k u | l o p | g h z6 | 7 | 8 | 9
RIT Computer Science Dept.
The Results
Using LispTime (hr): 2 to 8.5, ave: 5Lines of Code: 51 to 182Run Time (median): 30 seconds
Using C/C++Time (hr): 3 to 25, ave: 11Lines of Code: 107 to 614, ave: 277Run Time (median): 54 seconds
Using JavaTime (hr): 4 to 63, ave: 9Lines of Code: 107 to 614,ave: 277
Quickest C/C++ program ran faster than the quickest Lisp program
4
RIT Computer Science Dept.
Different Categories
Logical representation schemes. This class of representations uses expressions in formal logic to represent a knowledge base. Inference rules and proof procedures apply this knowledge to problem instances. Procedural representation schemes. Procedural schemes represent knowledge as a set of instructions for solving a problem. This contrasts with the declarative representations provided by logic and semantic networks. A production rule system is an example of this approach.Network representation schemes. Network representations capture knowledge as a graph in which the nodes represent objects or concepts in the problem domain and the arcs represent relations or associations between them. Structured representation schemes. Structured representation languages extend networks by allowing each node to be a complex data structure consisting of named slots with attached values.
RIT Computer Science Dept.
Different RepresentationSchemes
We’ll cover:– Brook’s Subsumption Architecture– Semantic networks– Frames– Trees
RIT Computer Science Dept.
Brooks’ Hypothesis
Rational behavior does not come from disembodied systems. Intelligence is the product of the interaction between an appropriately layered system and its environment. Intelligent behavior emerges from the interactions of architectures of organized simpler behaviors
RIT Computer Science Dept.
Subsumption Architecture
Constructed of augmented finite state machinesBased on production rules where inputs map to actionsLower level modules combine together and create emergent intelligent behaviorThere is no central memory[example given from Brook's 1985 paper]
RIT Computer Science Dept.
Semantic Networks
Use the associationist viewRepresent knowledge as a graph with nodes corresponding to facts or concepts and the arcs to relations or associations between concepts. Can represent inheritance relationshipsVariations used for natural language processing
RIT Computer Science Dept.
Examples
[shown in class and your book]
5
RIT Computer Science Dept.
Quillian’s Word System [1967]
Program defined English words in terms of other words as a dictionary doesThere were possible circular traversals of the graphEach node was a word concept and the knowledge base (KB) was organized into planes where each plane was a graph that defined a single word
RIT Computer Science Dept.
How it was used
The KB was used to find relationships between pairs of English words. Given two words, it would search the graphs outward from each word in a breadth-first fashion, searching for a common concept or intersection node.
RIT Computer Science Dept.
Quillian Suggested
This approach to semantics might provide a natural language processing system with the ability to:
– Determine the meaning of a body of English text by building up collections of these intersection nodes
– Choose between multiple meanings of words by finding the meanings with the shortest intersection path to other words in the sentence.
– Answer a flexible range of queries based on associations between word concepts in the queries and concepts in the system
RIT Computer Science Dept.
Problems
Graphs were just another notation for relationships
– Many people have worked on building up a richer set of link labels to model language semantics. By implementing semantics relationships as part of the representational scheme rather than the domain knowledge added by the system builder, KBs require less handcrafting
– Can a program be written to reduce sentences to a canonical form?
– There is a computational price to reducing everything to low-level primitives
RIT Computer Science Dept.
Frames
Frame system (p. 32):Each frame:
– Describes an instance or a class– Has one or more slots, which are are assigned
slot values– Slots can be frames– Can have procedural attachments
RIT Computer Science Dept.
Scripts
Many systems require a large amount of background knowledge in order to functionA script is a structured representation describing a stereotyped sequence of events in a particular context.Pieces of semantic meaning are represented with conceptual dependency relationships
6
RIT Computer Science Dept.
Example
When you enter a classroom, the teacher normally follows a certain script and you as a student do also!
RIT Computer Science Dept.
Components of a Script
Entry conditions: Things that must be true for a script to occurResults: Facts that are true once the script has terminated.Props: Things that support the content of the script.Roles: Actions that individual participants performScenes: Represent the temporal aspect of each script
RIT Computer Science Dept.
An example?
[class picks an example and we turn it into a script]
RIT Computer Science Dept.
Problems?
What kind of problems are there with scripts?
RIT Computer Science Dept.
Search Trees
The tree is a representation of a problem that makes it easy for a solution to be found.
Decision trees: yes/noGoal trees: problem broken into subproblems
RIT Computer Science Dept.
Deciding What to Do: Scenario Objectives(Simplified Counter-Terrorist Bomb Example)
[Booth, GDC 2004]
7
RIT Computer Science Dept.
Search
Many problems are search problemsTraveling salesman problemN-Queens or half toning?Object RecognitionInternet search or any kind of data miningRobot navigationVLSI circuit layout…
RIT Computer Science Dept.
State Space Search
Questions to be answered:– Is the problem solved guaranteed to find a
solution?– Will the problem solver always terminate or
can it become caught in an infinite loop?– When a solution is found will it always be
optimal?– What is the complexity of the search
process in terms of time usage? Memory usage?
RIT Computer Science Dept.
Uninformed Searches
Ch. 4 in bookBreadth-first – queue-basedUniform-costDepth-first (depth limited) – stack-basedIterative deepening depth-firstBi-directional
Example of navigation shown in class
RIT Computer Science Dept.
Basic Search Problem
All search techniques so far have a worst case exponential time complexityA full search may take too long!
– Chess has a possible 10120 game paths– Checkers has a possible 1040 game paths
RIT Computer Science Dept.
Comparing Search Techniques Russell&Norvig p. 81
YesYesNoNoYesYesOptimal?
O(b d/2)O(bd)O(bl)O(bm)O(b 1+[C*/ε])O(b d+1)Space
O(b d/2)O(b d)O(b l)O(b m)O(b 1+[C*/ε])O(b d+1)Time
YesYesNoNoYesYesComplete
bidirIterative deepening
Depth limited
Depth 1st
Uniform cost
Breadth 1st
RIT Computer Science Dept.
Examples
Trees for common 2-player games– Tic-tac-toe has a possible 9! game paths (this is
why tic-tac-toe tends to be used in brute force search examples)
– The Eight Puzzle is another game example
8
RIT Computer Science Dept.
Different Ways ofProblem Solving
Data-driven: forward chaining– Starts with facts and attempts to make a path to
the goal
Goal-driven: backward chaining– Starts from the goal and goes backwards
towards the initial facts
What is preferred?– It depends
RIT Computer Science Dept.
An Example
Say I want to confirm/deny that “I am a descendant of Thomas Jefferson”.
– He was born around 250 years ago and we assume 25 yr. per generation. We also assume that people general have more children than parents (say an average of 3 children)
– The required path back would be around 10. If we assume 2 parents for each person, then there are 210 possible states to search.
– The required path forward would be around 310
RIT Computer Science Dept.
Which one?
Goal-driven search suggested if:– Goal-hypothesis is given in the problem and is
easily formulated. Example: a theorem prover– There are a large number of rules that match
the facts of the problem and thus produce an increasing number of conclusions or goals.
– Problem data are not given but must be acquired by the problem solver. Example: a medical diagnosis system where diagnostic tests are ordered to confirm/deny a particular hypothesis
RIT Computer Science Dept.
The other?
Data-driven search is suggested if:– All or most of the data are given in the initial
problem statement. Systems that analyze data fall into this category
– There are a large number of possible goals, but only a few ways to use the facts. Example: DENDRAL, an expert system that finds the molecular structure of organic compounds based on their formula, mass spectrographic data, and knowledge of chemistry
– It’s difficult to form a goal or hypothesis
RIT Computer Science Dept.
Examples
Trees for common 2-player games– Tic-tac-toe has a possible 9! game paths (this is
why tic-tac-toe tends to be used in brute force search examples)
– The Eight Puzzle is another game example
What about solving mazes using trees?[example from book shown]
RIT Computer Science Dept.
Basic Search Problem
All search techniques so far have a worst case exponential time complexityA full search may take too long!
– Chess has a possible 10120 game paths– Checkers has a possible 1040 game paths
9
RIT Computer Science Dept.
Considering Complexityin Search
10120 is comparable to the number of molecules in the universe!If the branching factor of a tree (ave. number of branches for each parent) can be cut down, then the tree can be searched deeper
– This is easy to do for tic-tac-toe when you consider the symmetry of the board
RIT Computer Science Dept.
Heuristic Search
“Intelligence for a system with limited processing resources consists in making wise choices of what to do next…”
– Newell and Simon, 1976, Turing Award Lecture
RIT Computer Science Dept.
Consider Two Problems
Medical DiagnosisChess
RIT Computer Science Dept.
Two Basic Situations
A problem may not have an exact solution because of inherent ambiguities in the problem statement or available data. Medical diagnosis is an example of this. Heuristics are used to choose the most likely diagnosis and formulate a plan of treatment.A problem may have an exact solution, but the computational cost may be prohibitive. An example here is chess.
RIT Computer Science Dept.
Comments on the Nature of Heuristics
Heuristics have limited information and are seldom able to predict the exact behavior of the state space farther along in the search – can lead to sub-optimal solutionsHeuristic algorithms have two parts:
– The heuristic measure– An algorithm that uses it to search the state
space
RIT Computer Science Dept.
Hill Climbing
The simplest thing to do is to climb a hill the steepest path possibleHill climbing strategies search and evaluate only the best children’s children. They ignore other siblings.What could go wrong here?
10
RIT Computer Science Dept.
Local Maxima
What if you have to take a winding path (it goes down and sometimes up) to get to the top of the hill?
– Many algorithms in AI suffer from the problem of getting stuck in local maxima.
RIT Computer Science Dept.
Best-first Search
Local maxima are hopefully avoided by backtrackingUses two lists:
– An open list: Keeps track of current fringe of the search
– A closed list: keeps track of states already visited
– The algorithm orders states on open according to their “closeness” to a goal
– This amounts to a priority queue for the search
RIT Computer Science Dept.
Best-First Search
When the frontier of the search is uneven, best-first search can work well
RIT Computer Science Dept.
Admissibility
Admissibility: Heuristics that find the shortest path to a goal whenever it exists are admissible.Breadth-first search is admissible as it examines all possible states at level n, before going to the next level