Part l: What is intelligence and how can it be...

38
1/15/13 1 Part l: What is intelligence and how can it be 'embodied'? The Theory of Intelligence I don't think I've ever heard the concept explained any better than this 'Well you see, Norm, it's like this . . A herd of buffalo can only move as fast as the slowest buffalo. And when the herd is hunted, it is the slowest and weakest ones at the back that are killed first. This natural selection is good for the herd as a whole, because the general speed and health of the whole group keeps improving by the regular killing of the weakest members. In much the same way, the human brain can only operate as fast as the slowest brain cells. Now, as we know, excessive intake of alcohol kills brain cells. But naturally, it attacks the slowest and weakest brain cells first. In this way, regular consumption of beer eliminates the weaker brain cells, making the brain a faster and more efficient machine. And that, Norm, is why you always feel smarter after a few beers.'

Transcript of Part l: What is intelligence and how can it be...

1/15/13  

1  

Part l: What is intelligence and how can it be 'embodied'?

The Theory of Intelligence

I don't think I've ever heard the concept explained any better than this

'Well you see, Norm, it's like this . . A herd of buffalo can only move as fast as the slowest buffalo. And when the herd is hunted, it is the slowest and weakest ones at the back that are killed first. This natural selection is good for the herd as a whole, because the general speed and health of the whole group keeps improving by the regular killing of the weakest members. In much the same way, the human brain can only operate as fast as the slowest brain cells. Now, as we know, excessive intake of alcohol kills brain cells. But naturally, it attacks the slowest and weakest brain cells first. In this way, regular consumption of beer eliminates the weaker brain cells, making the brain a faster and more efficient machine. And that, Norm, is why you always feel smarter after a few beers.'

1/15/13  

2  

Enormous Breadth of Ideas

•  Classical Views •  Psychometrics of Intelligence •  Physical Symbol Systems Hypothesis •  Control/Information Theory Basis for Intelligence •  Sense-Model-Plan-Act Cycle •  Knowledge-Based Systems •  Behaviorism •  Behavior-based AI •  Connectionist Models •  Neural Network Approaches •  Cognitive Science Systems •  Big Data and Correlation •  Learning and Intelligence

Classical Views on Intelligence

Aristotle (384-323) the body and the mind exist as facets of the same being, with the mind being simply one

of the body's functions. He suggests that intellect consists of two parts: something similar to matter (passive intellect) and something similar to form (active intellect).

intellect is separable, impassible, unmixed, since it is in its essential nature activity. . . . When intellect is set free from its present conditions, it appears as just what it is and nothing more: it alone is immortal and eternal . . . and without it nothing thinks

the psyche as a substance able to receive knowledge. Knowledge is obtained through the psyche's capability of intelligence, although the five senses are also necessary to obtain knowledge. Senses receive the form of sensible objects without the matter, just as the wax receives the impression of the signet-ring without the iron or the gold.

memory is the persistence of sense impressions.

Thinking requires the use of images. While some animals can imagine, only man thinks. Knowing differs from thinking in that it is an active, creative process leading to the recognition of universals; it is akin to intuition, it does not cause movement, and it is independent of the other functions of the psyche.

1/15/13  

3  

Classical Views on Intelligence

Intelligence is controversial:

A very general mental capability that includes the ability to reason, plan, solve problems, think abstractly, comprehend complex ideas, learn quickly and learn from experience. It is not merely book learning, a narrow academic skill, or test-taking smarts. Rather, it reflects a broader and deeper capability for comprehending our surroundings.

Individuals differ from one another in their ability to understand complex ideas, to adapt effectively to the environment, to learn from experience, to engage in various forms of reasoning, to overcome obstacles by taking thought. Although these individual differences can be substantial, they are never entirely consistent: a given person's intellectual performance will vary on different occasions, in different domains, as judged by different criteria. Concepts of intelligence are attempts to clarify and organize this complex set of phenomena. Although considerable clarity has been achieved in some areas, no such conceptualization has yet answered all the important questions, and none commands universal assent.

What is Intelligence? (1)

Intelligence is a combination of the ability to: 1. Learn. This includes all kinds of informal and formal learning via any combination of experience, education, and training.

2. Pose problems. This includes recognizing problem situations and transforming them into more clearly defined problems.

3. Solve problems. This includes solving problems, accomplishing tasks, fashioning products, and doing complex projects.

This definition of intelligence is a very optimistic one. It says that each of us can become more intelligent. We can become more intelligent through study and practice, through access to appropriate tools, and through learning to make effective use of these tools (Perkins, 1995).

1/15/13  

4  

What is Intelligence? (2)

(Gardner 1983)

What is Intelligence? (3)

Sternberg (1988, 1997) focuses on three main components:

1. Practical intelligence--the ability to do well in informal and formal educational settings; adapting to and shaping one's environment; street smarts.

2. Experiential intelligence--the ability to deal with novel situations; the ability to effectively automate ways of dealing with novel situations so they are easily handled in the future; the ability to think in novel ways.

3. Componential intelligence--the ability to process information effectively. This includes meta-cognitive, executive, performance, and knowledge acquisition components that help to steer cognitive processes.

1/15/13  

5  

Empirical View of Intelligence (Psychometrics)

Common measure is the well-known IQ test - one (of many) is the Weschler Adult Intelligence Scale (Weschler 1955):

Information – general knowledge/facts Picture Completion – how to complete pictures to make them veridical Digit Span – repeat (forward/backward) a sequence of numbers Picture Arrangement – infer correct temporal order of a picture sequence Vocabulary – definitions of words Block Design – build given configurations from blocks Arithmetic – mental addition, subtraction, multiplication, division Object Assembly – assemble several puzzle pieces into complete figures Comprehension – questions about universal concepts and interpretations Digit-Symbol – tests learning of novel digit-symbol relationships Similarities – underlying constructs common to two objects or concepts

Artificial Intelligence

The approach is based on the assumption that many aspects of intelligence can be achieved by the manipulation of symbols, an assumption defined as the "physical symbol systems hypothesis" by Allen Newell and Herbert Simon in the middle 1960s.

By the 1980s, many researchers began to doubt that high-level symbol manipulation alone could account for all intelligent behaviors. Opponents of the symbolic approach include roboticists such as Rodney Brooks, who aims to produce autonomous robots without symbolic representation (or with only minimal representation) and computational intelligence researchers, who apply techniques such as neural networks and optimization to solve problems in machine learning and control engineering. Now, both approaches are in common use, often applied to different problems.

GOFAI was intended to produce general, human-like intelligence in a machine, whereas most modern research is directed at specific sub-problems.

1/15/13  

6  

Physical Symbol Systems Hypothesis

Claim: that human thinking is a kind of symbol manipulation (because a symbol system is necessary for intelligence) and that machines can be intelligent (a symbol system is sufficient for intelligence).

The idea has philosophical roots in Hobbes (1651) who claimed reasoning was "nothing more than

reckoning” Leibniz (1666) who attempted to create a logical calculus of all

human ideas Hume (1739) who thought perception could be reduced to "atomic

impressions” that can be grouped and arranged Kant (1781) who analyzed all experience as controlled by formal rules.

What does this look like? •  Formal Logic •  any form of mathematics •  Any computer program

1/15/13  

7  

Criticisms

The general classes of criticisms: - the physical symbol system hypothesis lacks symbol

grounding (how are symbols connected to the things they refer to?) which is presumed to be a requirement for general intelligent action.

- the common belief that intelligence requires non-symbolic processing.

- the common statement that the brain is simply not a computer and that computation as it is currently understood, does not provide an appropriate model for intelligence.

- the belief that the brain is essentially mindless, that most of what takes place are chemical reactions and that human intelligent behaviour is analogous to the intelligent behaviour displayed for example by ant colonies.

Knowledge-Based Systems Knowledge-Based Expert Systems employ knowledge to perform tasks that ordinarily require human knowledge at expert levels of performance Examples of the early, well-known systems include:

MYCIN: 400 heuristic rules written in English-like if-then form to diagnose and treat infectious blood diseases and explain its conclusions (Shortliffe 1976)

HEARSAY-II: multiple, independent, cooperating expert systems that communicated through a global database called a blackboard to understand connected speech in a 1000-word vocabulary (Erman et al 1980)

R1: incorporated 1000 if-then rules to configure orders for DEC’s VAX computer (McDermott 1980)

INTERNIST: contained nearly 100,000 judgments about relationships between diseases and symptoms in internal medicine (Pople 1975)

1/15/13  

8  

Control/Information Theory-Based Intelligence Kalman filter - use measurements observed over time, containing noise

(random variations) and other inaccuracies, and produce values that tend to be closer to the true values of the measurements and their associated calculated values.

The Kalman filter uses a system's dynamics model (i.e., physical laws of motion), known control inputs to that system, and measurements (such as from sensors) to form an estimate of the system's varying quantities (its state) that is better than the estimate obtained by using any one measurement alone. As such, it is a common sensor fusion algorithm.

Markov Chain

Markov Chain – a system that undergoes transitions from one state to another, between a finite or countable number of possible states. It is a discrete-time random process characterized as memoryless (the Markov property): the next state depends only on the current state and not on the sequence of events that preceded it.

Appears as a sequence of random variables X1, X2, X3, ... where

The possible values of Xi form a countable set S called the state space of the chain.

Markov chains are often described by a directed graph, where the edges are labeled by the probabilities of going from one state to the other states.

The term Markov chain Monte Carlo methodology covers cases where the process is in discrete time (discrete algorithm steps) with a continuous state space.

Examples: games determined by dice, Google’s Page Rank (but setting of probabilities is key!)

1/15/13  

9  

Sense-Model-Plan-Act Cycle

•  The robot senses the world, plans the next action, acts; at each step the robot explicitly plans the next move.

•  All the sensing data tends to be gathered into one global world model.

Sense Model Plan Act

An elaborated view of SMPA

1/15/13  

10  

Reactive Control

The robot has multiple instances of Sense-Act couplings. These couplings are concurrent processes, called behaviours, which take the local sensing data and compute the best action to take independently of what the other processes are doing.

Hybrid Deliberate/Reactive Strategy

The robot first plans (deliberates) how to best decompose a task into subtasks (also called “mission planning”) and then what are the suitable behaviours to accomplish each subtask. Then the behaviours starts executing as per the Reactive Paradigm. Sensing organization is also a mixture of Hierarchical and Reactive styles; sensor data gets routed to each behaviour that needs that sensor, but is also available to the planner for construction of a task-oriented global world model.

1/15/13  

11  

Hierarchical Control

Behaviour: What an Observer sees

After a couple of decades of AI research, Brooks, Moravec and others modeled basic skills such as motion, survival, perception, balance without high level symbols and claimed that the use of high level symbols was more complicated and less successful.

In a 1990 paper Elephants Don't Play Chess, Brooks challenged the physical symbol system hypothesis saying symbols are not always necessary:

"the world is its own best model. It is always exactly up to date. It always has every detail there is to be known. The trick is to sense it appropriately and often enough.”

1/15/13  

12  

Behaviorist Intelligence

In psychology (Watson 1913), behaviorism stood for one basic belief: humans are biological machines and as such do not consciously act, do not have their actions determined by thoughts, feelings, intentions or mental processes. Human behavior is a product of conditioning: humans react to stimuli.

Behaviorism comprises the position that all theories should have observational correlates but that there are no philosophical differences between publicly observable processes (such as actions) and privately observable processes (such as thinking and feeling).

Main influences: Pavlov, who investigated classical conditioning although he did not necessarily agree with behaviorism or behaviorists, Thorndike, Watson who rejected introspective methods and sought to restrict psychology to experimental methods, and Skinner who conducted research on conditioning.

Behaviorism in AI

in AI, Rod Brooks rejuvenated these ideas in his classic papers A robust layered control system for a mobile robot, IEEE Journal of Robotics and Automation, 1986 ; Intelligence without representation, Artificial intelligence, 1991; Intelligence without reason Artificial intelligence: critical concepts, 1991; Elephants don't play chess, Robotics and autonomous systems, 1990

"....that intelligence be reactive to dynamic aspects of the environment, that a mobile robot operate on time scales similar to those of animals and humans, and that intelligence be able to generate robust behavior in the face of uncertain sensors, an unpredicted environment and a changing world....."

- R. Brooks, Computers and Thought Lecture, International Joint Conference on Artificial Intelligence, Sydney, August 1991

The basic principles: Situatedness.........the world is its own best model Embodiment.........the real world grounds regress Intelligence...........intelligence is determined by the dynamics of interaction with the

world Emergence............intelligence is in the eye of the observer

1/15/13  

13  

Interesting Quotes

–  "I reject traditional Artificial Intelligence representation schemes. I also made it clear that I reject explicit representations of goals within the machine." (p588, Brooks 1991b).

–  "Just as there is no central representation there is not even a central system. Each activity producing layers connects perception to action directly" (p148, Brooks 1991a).

–  "There is no purposeful central locus of control” (p149, Brooks 1991a). –  "There are no rules which need to be selected through pattern matching. There are

no choices to be made." (p149, Brooks 1991a). –  "There is no hierarchical arrangement" (p586, Brooks 1991b). –  "I think that the new approach can be extended to cover the whole story, both with

regards to building intell igent systems and to understanding human intelligence” (p585, Brooks 1991b).

Brooks, R., (1991a). Intelligence without representation, Artificial Intelligence 47, 139 - 159.

Brooks, R., (1991b). Intelligence without reason, Proc. IJCAI, Sydney, p. 569 - 595, Australia, August.

1/15/13  

14  

Strongly Related Proposals

Ballard (1990) : animate vision

Ramachandran (1990) : utilitarian theory

Aloimonos and Rosenfeld (1992) : purposive and qualitative vision

– the strict behaviorist position for the modeling of intelligence does not scale to human-like problems and performance.

– Need: intermediate representations goal-directed processing hierarchical organization attentive processes

See Kirsch 1991 for a qualitative argument; here the focus is on a quantitative and biologically-motivated viewpoint

However.... Tsotsos, J.K., On Behaviorist Intelligence and the Scaling Problem, Artificial Intelligence 75, p135 - 160, 1995.

1/15/13  

15  

What does scales up mean in this context? A computational theory scales to human-like problem sizes if:

• the algorithm that embodies the theory accepts up to the same number of input samples of the world per unit time as human sensory organs

• the implementation that realizes the algorithm exists in the real world and requires amounts of physical resources which exist

• the output behavior of the implementation as a result of those stimuli is comparable in quality to human behavior and requires time that is acceptable in some practical sense

A computational theory scales to human-like problem sizes in a biologically plausible manner if in addition:

• solutions should require significantly fewer than about 109 processors operating in parallel, each able to perform one multiply-add operation over all its input per millisecond

• processor average fan-in and fan-out should be about 1000 overall • solutions should not involve more than a few hundred sequential

processing steps (may be much less!) • the output behavior of the implementation as a result of those stimuli is

comparable in timing to human behavior

An abstraction of Connell's architecture

action resolve conflicts

S-A #1

S-A #2

S-A #3

S-A #4

S-A #5

stimulus transducer

stimulus-action pairs

stimulus world

Visual search

viewed within the behaviorist paradigm; it is an integral part of any intelligent behavior.

a subject is presented with a target and a test image. This involves 2 behaviors (or stimulus-action pairs): 1) if target present, press button A; 2) if target absent, press button B.

1/15/13  

16  

Theorems

Theorem 1: Unbounded Passive Visual Search is NP-Complete.

Theorem 2: Bounded Passive Visual Search has linear time complexity in the number of input image locations

Theorem 3 : Unbounded Active Visual Search Is NP-Complete

Theorem 4: Bounded Active Visual Search has linear time complexity

in the number of input image locations

Perception within Behaviorism

Three points:

Perception is not easy

All Perceptual Acts are not Externally Observable

Biological Realities

1/15/13  

17  

Perception is not Easy

Behaviorists imply that perception is easy*:

1) a set of primitives exist that can be used as input to behavior modules and that these primitives are next to trivial to compute.

2) the world be used as memory and visual information can be rapidly re-acquired on demand

* Some logicians make similar unreasonable assumptions about perception

Reality: (see Treisman, Julesz, etc...) I) If the target (and position) is known, and it is distinguished from all distractors in an obvious manner

(colour for example), then detection, localization and recognition can be done in time independent of the number of items in the display.

II) If the target (but not position) is known, and it is distinguished from all distractors in an obvious manner, then detection, localization and recognition can be done in time independent of the number of items in the display, but is slightly slower than case I.

III) If the target (but not position) is known, and it is not distinguished uniquely in a simple way, then detection may require up to linear time in the number of display items.

IV) If the target (and position) is known, and it is not distinguished uniquely in a simple way, then detection may require time up to linear in the number of display items, but is much faster than in type III.

V) If the target is not known, but is distinguished in an obvious manner, then detection and localization may be done in time linear in the number of display items (for example, texture pop-out).

VI) Otherwise, detection and localization may require anything from a high slope linear function to exponential time .

- the embodiment principle seems to be working against behaviorism here.

1/15/13  

18  

All Perceptual Acts are not Externally Observable

visual search behavior is due in part to a form of attention termed "covert", that is unobservable externally (Posner 1980)

humans seem to be able to acquire a new covert fixation every 30 -50 ms

- the intelligence and emergence principles seems to be working against behaviorism here.

Biological Realities Three behaviorists' claims go against biology :

there are no intermediate representations; there are no hierarchical computations; and there is no explicit representation of goals.

"It was easy to show that any study of mental activity that failed to consider representations of mental events was inadequate to account for all but the simplest forms of behavior" (Kandel and Squire 1992 p. 143).

- the situatedness principle seems to be working against behaviorism here.

1/15/13  

19  

Behaviorism and the Scaling Problem

There are four major components to a behaviorist solution (call the problem Stimulus-Behavior Search):

localize and recognize a stimulus (with no pre-specified target); link the stimulus to an applicable action; decide among all possible actions; generate actuator commands.

Theorem 5: Unbounded Passive Stimulus-Behavior Search is NP-hard.

Theorem 6: Unbounded Active Stimulus-Behavior Search is NP-hard.

Some Numerology......... motivated by Connell's actual system implementation....

10,000,000 behaviors (for each of 25,000 objects, 5 different lighting conditions, 5 different time-varying contexts, 2 different colors, 4 different spatial contexts

and 2 different behaviors for each condition)

1. Each behavior looks at whole scene: if there are 2 retinas 250,000,000 photoreceptors, total connectivity would be 2.5 x 1015 about the number of synapses in 250 human brains....not possible

2. cut down connectivity by eliminating duplicate effort or divide-and-conquer strategies - intermediate representation. ...not permitted

3. Active perception......requires intermediate hypotheses by definition....not permitted

1/15/13  

20  

10,000,000

L L-x L

10,000,000 n ∑

x=1

L-1 = 10

14 (L-1) n L

4. connectivity of priority network: output of the 10,000,000 behaviors must now be processed such that the most appropriate behavior is the overall system response. A priority network is needed for deciding this. Each behavior may spy on and inject signals into any other lower level behavior. Assume L levels, each with an equal number of behaviors; each behavior on average only affects 1/nth of all the behaviors in the levels below it:

If the number of behaviors is represented by B, then B2 (L-1)/nL gives the number of connections required for the priority network

.....not possible except for large n which defeats the purpose of the network

What do the behaviorists really mean? • visual search can be viewed within the behaviorist paradigm and is an

integral component of any intelligent behavior

• behaviorism with embedded unbounded visual search problems may require exponential time (exponential in image size) for the signal matching tasks

• active behaviorism requires intermediate representations of hypotheses by definition

• strict behaviorism cannot satisfy the definition of scaling

The last word....... "the behaviorist's strategy is not appropriate for high-bandwidth, high density sensors such as vision. Coarse grain, low density, low bandwidth sensors naturally lead to solutions involving parallelism for stimulus-action activation" (Mataric 1992).

1/15/13  

21  

A simple (insect-like) architecture A reactive system does not construct descriptions of possible futures evaluate them and then choose one. It simply reacts (internally or externally).

An adaptive system with reactive mechanisms can be a very successful biological machine. Some purely reactive species also have a social architecture, e.g. ants, termites, and other insects.

Features of reactive organisms The main feature of reactive systems is that they lack the ability to represent and reason

about non-existent phenomena (e.g. future possible actions), the core ability of deliberative systems

Reactive systems need not be “stateless”: some internal reactions can change internal states, and that can influence future reactions.

In particular, reactive systems may be adaptive: e.g. trainable neural nets, which adapt as a result of positive or negative reinforcement.

Some reactions will produce external behaviour. Others will merely produce internal changes.

Internal reactions may form loops.

In principle a reactive system can produce any external behaviour that more sophisticated systems can produce: but possibly requiring a larger memory for pre-stored reactive behaviours than could fit into the whole universe. Evolution seems to have discovered the advantages of deliberative capabilities.

1/15/13  

22  

Sometimes the ability to plan is useful Deliberative mechanisms provide the ability to represent possibilities (e.g. possible actions, possible explanations for what is perceived)

Much, but not all, early symbolic AI was concerned with deliberative systems (planners, problem-solvers, parsers, theorem-provers).

Deliberative Mechanisms

These differ in various ways: –the forms of representations (often data-structures in virtual machines) –the variety of forms available (e.g. logical, pictorial, activation vectors) –the algorithms/mechanisms available for manipulating representations –the number of possibilities that can be represented simultaneously –the depth of ‘look-ahead’ in planning –the ability to represent future, past, or remote present objects or events –the ability to represent possible actions of other agents –the ability to represent mental states of others (linked to meta-management, below). –the ability to represent abstract entities (numbers, rules, proofs) –the ability to learn, in various ways

Some deliberative capabilities require the ability to learn new abstract associations, e.g. between situations and possible actions, between actions and possible effects

1/15/13  

23  

Evolutionary pressures on perceptual and action mechanisms for deliberative agents

New levels of perceptual abstraction (e.g. perceiving object types, abstract affordances), and support for high-level motor commands (e.g. “walk to tree”, “grasp berry”) might evolve to meet deliberative needs (e.g. taller people)

Perception and action towers in the diagram.

‘Multi-window’ perception and action, vs ‘peephole’ perception and action.

A deliberative system may need an alarm mechanism

Inputs to an alarm mechanism may come from anywhere in the system, and outputs may go to anywhere in the system.

An alarm system can override, interrupt, abort, or modulate processing in other systems.

It can also make mistakes because it uses fast rather than careful decision making.

1/15/13  

24  

Adding in Attention

Additional mechanisms act as attention filters, to help suppress some alarms or other disturbances during urgent and important tasks

Meta-Management

A deliberative system can easily get stuck in loops or repeat the same unsuccessful attempt to solve a sub-problem.

One way to prevent this is to have a parallel sub-system monitoring and evaluating the deliberative processes. If it detects something bad happening, then it may be able to interrupt and re-direct the processing.

Requires self-knowledge, and methods for self-evaluation and self-control

Meta-management (reflection) evolved to enable self-monitoring.

1/15/13  

25  

Connectionist Models

•  Feldman, J. A., & Ballard, D. H. (1982). Connectionist models and their properties. Cognitive Science, 6, 205-254.

•  Key motivation: “The critical resource that is most obvious is time. Neurons whose

basic computational speed is a few milliseconds must be made to account for complex behaviors which are carried out in a few hundred milliseconds (Posner, 1978). This means that entire complex behaviors are carried out in less than a hundred time steps. Current AI and simulation programs require millions of time steps.”

The Idea

“The idea of simultaneously evaluating many hypotheses has been successfully used in machine perception for some time (Hanson & Riseman, 1978). What has occurred to us relatively recently is that this is a natural mode of computation for widely interconnected networks of active elements like those envisioned in connectionist models. The generalization of these ideas to the connectionist view of brain and behavior is that all important encodings in the brain are in terms of the relative strengths of synaptic connections. The fundamental premise of connectionism is that individual neurons do not transmit large amounts of symbolic information. Instead they compute by being appropriately connected to large numbers of similar units.”

1/15/13  

26  

Contrasts

For a number of reasons (including redundancy for reliability), it is highly unlikely that there is exactly one neuron for each concept, but the point of view taken here is that the activity of a small number of neurons (say 10) encodes a concept like apple. An alternative view (Hinton & Anderson, 1981) is that concepts are represented by a "pattern of activity" in a much larger set of neurons (say 1,000) which also represent many other concepts (population code).

One of the major problems with diffuse models as a parallel computation scheme is cross-talk among concepts. Although a single concept could be transmitted in parallel, complex concepts would have to go one at a time. Simultaneously transmitting multiple concepts that shared units would cause cross-talk.

Basic Elements of CMs

•  Neuron-like computing units: Each unit will be characterized by a small number of discrete states plus: p - a continuous value in [-10, 10], called potential v - an output value, integers 0 ≤ v ≤ 9 i - a vector of inputs i1, ..... in (the outputs of other computing units)

•  one example is the P-Unit - output v is proportional to its potential p (rounded) when p > 0 and which has only one state. In other words

p ⃪ p +β Σ wkik [0 ≤ wk ≤ 1] v – ⃪ if p > θ then round (p - θ) else 0 [v = 0....9] where β, θ are constants and wk are weights on the input values. The

weights are the sole locus of change with experience in the current model.

Summation can be replaced by any other function (max, min...)

1/15/13  

27  

Networks of Units •  Units can represent things and connections between units can represent

strength of relationships, for example, a sequence of events:

•  Decision-making using Winner-Take-All networks - only the unit with the highest potential (among a set of contenders) will have output above zero after some setting time

p ⃪ if p > max(ij , ....1) then p else 0.

Coarse and Coarse-Fine Coding

- a general strategy for reducing the number of units needed to represent a range of values with some fixed precision - Hinton (1980).

In general, D simultaneous activations of coarse cells of diameter D precise units suffice. For a parameter space of dimension k, a range of F values can be captured by only Fk/Dk-1 units rather than Fk in the naive method.

1/15/13  

28  

Neural Networks Connectionism models mental or behavioral phenomena as the

emergent processes of interconnected networks of simple units. Mostly, units and their connections have uniform properties throughout

the network. A neural network is a kind of connectionist architecture where,

historically, units are designed to mimic properties of neurons and connections properties of synapses. Modern NN’s have very little relationship to real neural or brain structures.

Originally termed Parallel Distributed Processes - classic books by Rumelhart, D.E., J.L. McClelland and the PDP Research Group (1986).

Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations, Cambridge, MA: MIT Press

McClelland, J.L., D.E. Rumelhart and the PDP Research Group (1986). Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 2: Psychological and Biological Models, Cambridge, MA: MIT Press

Historical Roots

•  Precursors were the perceptron theory of Frank Rosenblatt (1961).

•  But Minksy & Papert’s book Perceptrons (1969) demonstrated the limits on the sorts of functions which single layered perceptrons can calculate, showing that even simple functions like the exclusive disjunction could not be handled properly.

•  The PDP books overcame this limitation by showing that multi-level, non-linear neural networks were far more robust and could be used for a vast array of functions.

1/15/13  

29  

Parallel Distributed Processing The framework involved: A set of processing units, represented by a set of integers. An activation for each unit, represented by a vector of time-dependent

functions. An output function for each unit, represented by a vector of functions on the

activations. A pattern of connectivity among units, represented by a matrix of real numbers

indicating connection strength. A propagation rule spreading the activations via the connections, represented

by a function on the output of the units. An activation rule for combining inputs to a unit to determine its new activation,

represented by a function on the current activation and propagation. A learning rule for modifying connections based on experience, represented by

a change in the weights based on any number of variables. An environment which provides the system with experience, represented by

sets of activation vectors for some subset of the units. A perceived limitation of PDP is that it is reductionistic. That is, all cognitive

processes can be explained in terms of neural firing and communication.

SOAR

Soar is a symbolic cognitive architecture, created by John Laird, Allen Newell, and Paul Rosenbloom at Carnegie Mellon University, now maintained by John Laird's research group at the University of Michigan.

The main goal of the Soar project is to be able to handle the full range of capabilities of an intelligent agent, from highly routine to extremely difficult open-ended problems. In order for that to happen it needs to be able to create representations and use appropriate forms of knowledge (such as procedural, declarative, episodic). Also underlying the Soar architecture is the view that a symbolic system is essential for general intelligence.

SOAR originally stood for State, Operator And Result, reflecting this representation of problem solving as the application of an operator to a state to get a result. According to the project FAQ, the Soar development community no longer regards Soar as an acronym so it is no longer spelled all in caps though it is still representative of the core of the implementation.

1/15/13  

30  

SOAR

Soar is based on a production system, i.e. it uses explicit production rules to govern its behavior (these are roughly of the form "if... then...", as also used in expert systems). Problem solving can be roughly described as a search through a problem space (the collection of different states which can be reached by the system at a particular time) for a goal state (which represents the solution for the problem). This is implemented by searching for the states which bring the system gradually closer to its goal. Each move consists of a decision cycle which has an elaboration phase (in which a variety of different pieces of knowledge bearing the problem are brought to Soar's working memory) and a decision procedure (which weighs what was found on the previous phase and assigns preferences to ultimately decide the action to be taken).

If the decision procedure just described is not able to determine a unique course of action, Soar may use different strategies, known as weak methods to solve the impasse. These methods are appropriate to situations in which knowledge is not abundant. Some examples are means-ends analysis (which may calculate the difference between each available option and the goal state) and a type of hill-climbing. When a solution is found by one of these methods, Soar uses a learning technique called chunking to transform the course of action taken into a new rule. The new rule can then be applied whenever Soar encounters the situation again (that is, there will no longer be an impasse).

SOAR Block Diagram

1/15/13  

31  

ACT-R

•  Adaptive Control of Thought—Rational) is a cognitive architecture mainly developed by John Robert Anderson at Carnegie Mellon University. Like any cognitive architecture, ACT-R aims to define the basic and irreducible cognitive and perceptual operations that enable the human mind. In theory, each task that humans can perform should consist of a series of these discrete operations.

•  Most of the ACT-R basic assumptions are also inspired by the progress of cognitive neuroscience, and ACT-R can be seen and described as a way of specifying how the brain itself is organized in a way that enables individual processing modules to produce cognition.

ACT-R Main Components Modules. There are two types of modules: perceptual-motor modules, which take care of the

interface with the real world (i.e., with a simulation of the real world), The most well-developed perceptual-motor modules in ACT-R are the visual and the manual modules.

memory modules. There are two kinds of memory modules declarative memory , consisting of facts procedural memory, made of productions. Productions

represent knowledge about how we do things

Buffers. ACT-R accesses its modules (except for the procedural-memory module) through buffers. For each module, a dedicated buffer serves as the interface with that module. The contents of the buffers at a given moment in time represents the state of ACT-R at that moment. Pattern Matcher. The pattern matcher searches for a production that matches the current state of the buffers. Only one such production can be executed at a given moment. That production, when executed, can modify the buffers and thus change the state of the system. Cognition appears as a succession of production firings. ACT-R is a hybrid cognitive architecture. Its symbolic structure is a production system; the sub-symbolic structure is represented by a set of massively parallel processes that can be summarized by a number of mathematical equations. The sub-symbolic equations control many of the symbolic processes. For instance, if several productions match the state of the buffers, a sub-symbolic utility equation estimates the relative cost and benefit associated with each production and decides to select for execution the production with the highest utility. Similarly, whether (or how fast) a fact can be retrieved from declarative memory depends on sub-symbolic retrieval equations, which take into account the context and the history of usage of that fact. Sub-symbolic mechanisms are also responsible for most learning processes in ACT-R.

1/15/13  

32  

Big Data Approaches to Intelligence from The End of Theory: The Data Deluge Makes the Scientific Method

Obsolete, Chris Anderson, WIRED Magazine 16:07 Peter Norvig, Google's research director: All models are wrong, and increasingly you can succeed without them. This is a world where massive amounts of data and applied mathematics

replace every other tool that might be brought to bear. Out with every theory of human behavior, from linguistics to sociology. Forget taxonomy, ontology, and psychology. Who knows why people do what they do? The point is they do it, and we can track and measure it with unprecedented fidelity. With enough data, the numbers speak for themselves.

Correlation is enough. We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot.

The Scientific Method

The big target here isn't advertising, though. It's science. The scientific method is built around testable hypotheses. These models, for the most part, are systems visualized in the minds of scientists. The models are then tested, and experiments confirm or falsify theoretical models of how the world works. This is the way science has worked for hundreds of years.

1/15/13  

33  

The Big Data Approach looks like Epidemiology!

Although epidemiology is sometimes viewed as a collection of statistical tools used to elucidate the associations of exposures to health outcomes, a deeper understanding of this science is that of discovering causal relationships.

Correlation ≠ Causal Connection! Epidemiologists use gathered data and a broad range of biomedical and psychosocial theories in an iterative way to generate or expand theory, to test hypotheses, and to make educated, informed assertions about which relationships are causal, and about exactly how they are causal.

Criteria for Causality in Epidemiology

•  Strength: A small association does not mean that there is not a causal effect, though the larger the association, the more likely that it is causal.

•  Consistency: Consistent findings observed by different persons in different places with different samples strengthens the likelihood of an effect.

•  Specificity: Causation is likely if a very specific population at a specific site and disease with no other likely explanation. The more specific an association between a factor and an effect is, the bigger the probability of a causal relationship.

•  Temporality: The effect has to occur after the cause (and if there is an expected delay between the cause and expected effect, then the effect must occur after that delay).

•  Biological gradient: Greater exposure should generally lead to greater incidence of the effect. However, in some cases, the mere presence of the factor can trigger the effect. In other cases, an inverse proportion is observed: greater exposure leads to lower incidence.

•  Plausibility: A plausible mechanism between cause and effect is helpful (but knowledge of the mechanism is limited by current knowledge).

•  Coherence: Coherence between epidemiological and laboratory findings increases the likelihood of an effect.

•  Experiment: "Occasionally it is possible to appeal to experimental evidence”. •  Analogy: The effect of similar factors may be considered.

1/15/13  

34  

And there’s more •  What is the mechanism that leads to the effect? •  In medicine, this is important if one wishes to treat it!

•  Does representation never matter? Consider Zucker (1981):

Zucker claimed that computational models of perception have two essential components: representational languages for describing information and mechanisms that manipulate those representations. The many diverse kinds of constraints on theories discovered by experimental work, whether psychophysical, neurophysiologic, or theoretical, impact the representations and manipulation mechanisms. Zucker goes on to suggest that all of these different kinds of constraints are needed, or the likelihood of discovering the correct explanation is seriously diminished. computational theories and constraints, one is faced with the problem of inferring what staggering numbers of neurons are doing without a suitable language for describing either them or their scope. One of the strongest arguments for having explicit abstract representations is the fact that they provide explanatory terms for otherwise difficult (if not impossible) notions.

Reality of the Probabilistic Approach

•  is a given dataset complete? •  are samples in a given dataset uniformly distributed? •  are algorithms truly detecting target and not background?

Probabilistic/Bayesian approaches only make sense if it is possible to: –  ensure conditions of Bayes Theorem are met (independent and

identically distributed events) –  ensure sample set is complete (or a good approximation to complete) –  ensure sample set contains no biases (over/under representation) –  prior values exist and are reliable

1/15/13  

35  

Learning and Intelligence •  Central questions:

–  how does intelligence come about? –  how does intelligence influence learning? –  how is intelligence related to efficacy of information processing

subsequent to learning?

•  How to know if a program has ‘learned’? –  A computer program is said to learn from experience E with respect to

some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experience E

Algorithm Types

Machine learning algorithms can be organized into a taxonomy based on the desired outcome of the algorithm:

•  Supervised learning generates a function that maps inputs to desired outputs (also called labels, because they are often provided by human experts labeling the training examples). For example, in a classification problem, the learner approximates a function mapping a vector into classes by looking at input-output examples of the function.

•  Unsupervised learning models a set of inputs, like clustering. •  Semi-supervised learning combines both labeled and unlabeled examples to

generate an appropriate function or classifier. •  Reinforcement learning learns how to act given an observation of the world.

Every action has some impact in the environment, and the environment provides feedback in the form of rewards that guides the learning algorithm.

•  Transduction tries to predict new outputs based on training inputs, training outputs, and test inputs.

•  Learning to learn learns its own inductive bias based on previous experience.

1/15/13  

36  

Challenges (from Brooks 1991) There seems a quite natural decomposition of the task into three distinct levels

of abstraction: Micro: The study of how distinct couplings between the creature and its

environment are maintained for some particular task. Example tasks we temperature regulation, wall following, climbing, etc. Primary concerns here are what forms of perception are necessary, and what relationships exist between perception, internal state, and action (i.e., how behavior is specified or described).

Macro: The study of how the many micro perceptions and behaviors are integrated into a complete individual creature. Primary concerns here are how independent various perceptions and behaviors can be, how much they must rely on and interfere with each other, how a competent complete creature can be built in such a way as to accommodate all the required individual behaviors, and to what extent complex behaviors can emerge from simple reflexes.

Multitude: The study of how a multitude of individual creatures interact as they go about their business. Primary concerns here are the relationships between individuals' behaviors, the amount and type of communication between creatures. the way the environment reacts to multiple individuals, and the resulting patterns of behavior and their impacts upon the environment (which would not occur in the case of isolated individuals).

Micro Level

The particular challenges at the micro level are as follows. Convergence: Demonstrate or prove that a creature is programmed in such a

way that its external behavior will indeed achieve a particular task successfully. For instance, we may want to give some set of initial conditions for a creature, and some limitations on possible worlds in which it is placed, and show that under those conditions, the creature is guaranteed to follow a particular wall, rather than diverge and get lost.

Complexity: Deal with the complexity of real world environments, and sift out the relevant aspects of received sensations rather than being overwhelmed with multitudes of data.

Synthesis: Given a particular task, automatically derive a program for the creature so that it carries out that task in a way

1/15/13  

37  

Macro Level

Coherence: Even though many behaviors may be active at once, or are being actively switched on or off, the creature should still appear to an observer to have coherence of action and goals. It should not be rapidly, switching between inconsistent behaviors, nor should two behaviors be active simultaneously, if they interfere with each other to the point that neither operates successfully.

Salience: The behaviors that are active should be salient to the situation the creature finds itself in it should recharge itself when the batteries are low, not when they are full.

Adequacy: The behavior selection mechanism must operate in such a way that the long term goals that the creature designer has for the creature are met. i.e., a floor cleaning robot should successfully clean the floor in normal circumstances, besides doing all the ancillary tasks that are necessary for it to be successful at that.

Multitude Level Emergence: Given a set of behaviors programmed into a set of creatures we would like to

be able to predict what the global behavior of the system will be, and as a consequence determine the differential effects of small changes to the individual creatures on the global behavior.

Synthesis: As at the micro level, given a particular task, automatically derive a program for the set of creatures so that they carry out the task.

Individuality: Robustness can be achieved if all creatures are interchangeable. A fixed number of classes of creatures, where all creatures within a class are identical, is also robust, but somewhat less so. The issue then is given a task, decide how many classes of creatures are necessary

Cooperation: In some circumstances creatures should be able to achieve more by cooperating— the form and specification of such cooperation need to be understood.

Interference: Creatures may interfere with one another. Protocols for avoiding this when it is undesirable must be included in the design of the creatures’ instructions.

Density dependence: The global behavior of the system may be dependent on the density of the creatures and the resources they consume. A characterization of this dependence is desirable. At the two ends of the spectrum it may be the case that (a) a single creature given n units of time performs identically to n robots each given 1 unit of time, and (2) the global task might not be achieved at all if there are fewer than m creatures.

Communication: Performance may be increased by increasing the amount of explicit communication between creatures, but the relationship between the amount of communication increase and performance increase needs to be understood.

1/15/13  

38  

Josh Tenenbaum from AAAI 2012, "How to Grow a Mind: Statistics, Structure and Abstraction".

http://videolectures.net/aaai2012_tenenbaum_grow_mind/

http://mit150.mit.edu/symposia/brains-minds-machines