6CS6.2 Unit 5 Learning

Artificial Intelligence (6CS6.2)

Unit 5 : Learning

Topics5.1 Introduction to Learning and Learning Methods

5.1.1 Rote learning5.1.2 Learning by Taking Advice5.1.3 Learning in Problem Solving Experience5.1.4 Learning from Examples or Induction5.1.5 Explanation Based Learning5.1.6 Discovery 5.1.7 Analogy

5.2 Formal Learning Theory5.3 Introduction to Neural Networks and Applications5.4 Common sense reasoning5.5 Expert systems

5.1 Introduction to Learning• An AI system cannot be called intelligent until It can learn to

do new things and adapt to new situations, rather than simplydoing as they are told to do.

• Definitions of Learning: changes in the system that areadaptive in the sense that they enable the system to do thesame task more efficiently and more effectively the next time.

• Learning covers a wide range of phenomenon:1. Skill Refinement : Practice makes skills improve. More

you play tennis, better you get2. Knowledge Acquisition : Knowledge is generally

acquired through experience.

Various Learning Mechanism1. Simple storing of computed information or rote learning, is

the most basic learning activity. Many computer programsie., database systems can be said to learn in this sensealthough most people would not call such simple storagelearning.

2. Another way we learn if through taking advice from others.Advice taking is similar to rote learning, but high-level advicemay not be in a form simple enough for a program to usedirectly in problem solving.

3. People also learn through their own problem-solvingexperience.

4. Learning from examples : we often learn to classify thingsin the world without being given explicit rules. Learning fromexamples usually involves a teacher who helps us classifythings by correcting us when we are wrong.

5.1.1 : Rote Learning *• When a computer stores a piece of data, it is performing a

basic form of learning. In case of data caching, we storecomputed values so that we need not to re-compute themlater.

• When computation is more expensive than recall, thisstrategy can save a significant amount of time. Caching hasbeen used in AI programs to produce some surprisingperformance improvements. Such caching is known as rotelearning.

• Rote learning does not involve any sophisticated problem-solving capabilities. It shows the need for some capabilitiesrequired of complex learning systems such as:– Organized Storage of information– Generalization

Rote Learning : Example

In figure 1, value of A iscomputed as 10. This valueis stored for some futureuse.

In figure 2, value of Arequired again to solve newgame tree.

Instead of re-computing thevalue of node A, rotelearning is being used andstored value of node A isdirectly applied.

5.1.2 : Learning by taking Advice• A computer can do very little without a program for it to run. When

a programmer writes a series of instructions into a computer, abasic type of learning is taking place: The programmer is sort ofa teacher and the computer is a sort of student.

• After being programmed, the computer is now able to dosomething it previously could not.

• Executing a program may not be such a simple matter. Supposethe program is written in high level language such as Prolog, someinterpreter or compiler must intervene to change the teacher’sinstructions into code that the machine can execute directly.

• People process advice in an analogous way. In chess, the advice“take control of the chess board center” is useless unless theplayer can translate the advice into concrete moves and plans. Acomputer program might make use of the advice by adjusting itsstatic evaluation function to include a factor based on the numberof center squares attacked by its own pieces.

Learning by advice cont…• A program called FOO, which accepts advice for playing “hearts”,

a card game. A human user first translates the advice from englishinto a representation that FOO can understand.

• A human can watch FOO play, detect new mistakes, and correctthem through yet more advice, such as “play high cards when it issafe to do so”.

• Limitation: Ability to operationalize the knowledge is achallenging job in “learning by advice”.

5.1.3 : Learning In Problem Solving Experience• Can program get better without the aid of a teacher? • It can be by generalizing from its own experiences.

• Various techniques are as follows:5.1.3.a Learning by Parameter Adjustment5.1.3.b Learning with Macro Operators5.1.3.c Learning by Chunking

5.1.3.a : Learning by Parameter Adjustment• Many programs rely on an evaluation procedure that combines

information from several sources into a single summary statistic.1. Game playing programs do this in their static evaluation functions in

which a variety of factors such as piece advantage and mobility arecombined into a single score reflecting the desirability (goodness orusefulness) of a particular board position.

2. Pattern classification programs often combine several features todetermine the correct category into which a given stimulus should beplaced.

• In designing such programs, it is often difficult to know a priori how muchweight should be attached to each feature being used. One way offinding the correct weights is to begin with some estimate of the correctsettings and then to let the program modify the settings on the basis of itsexperience.

• Features that appear to be good predictors of overall success will havetheir weights increased, while those that do not will have their weightsdecreased. Samuel’s checkers program uses static evaluation functionin the polynomial: c1t1 + c2t2 + … +c16 t16

• The t terms are the values of the sixteen features that contribute to theevaluation. The c terms are the coefficients that are attached to each ofthese values. As learning progresses, the c values will change.

5.1.3.b Learning by Macro-Operators• Sequences of actions that can be treated as a whole are called

macro-operators.

• Example: suppose we want to go to the main post office of thecity. Our solution may involve getting in your car, starting it anddriving along a certain route. Substantial planning may go intochoosing the appropriate route.

• Here, we need not to plan about “how to start the car”. We can use“START-CAR” as an atomic action. It actually consists of severalprimitive actions like (1) sitting down, (2) adjusting the mirror, (3)inserting the key and (4) turning the key.

• Macro-operators were used in the early problem solving systemSTRIPS. After each problem solving episode, the learningcomponent takes the computed plan and stores it away as amacro-operator, or MACROP.

• MACROP is just like a regular operator, except that it consists of asequence of actions, not just a single one.

5.1.3.c Learning by Chunking• Chunking is a process similar in flavor to macro-operators.

The idea of chunking comes from the psychological literatureon memory and problem solving. Its computational basis is inProduction systems.

• When a system detects useful sequence of production firings,it creates chunk, which is essentially a large production thatdoes the work of an entire sequence of smaller ones.

• SOAR is an example production system which useschunking.

• Chunks learned during the initial stages of solving a problemare applicable in the later stages of the same problem-solvingepisode. After a solution is found, the chunks remain inmemory, ready for use in the next problem.

• At present, chunking is inadequate for duplicating thecontents of large directly-computed macro-operator tables.

5.1.4 : Learning from Examples: Induction **• Classification is the process of assigning, to a particular input, the name

of a class to which it belongs. The classes from which the classificationprocedure can choose can be described in a variety of ways.

• Their definition will depend on the use to which they are put. Classificationis an important component of many problem solving tasks.

• Before classification can be done, the classes it will use must be defined:1. Isolate a set of features that are relevant to the task domain. Define

each class by a weighted sum of values of these features. Exa: taskis weather prediction, the parameters can be measurements such asrainfall, location of cold fronts etc.

2. Isolate a set of features that are relevant to the task domain. Defineeach class as a structure composed of these features. Ex:classifying animals, various features can be such things as color,length of neck etc

• The idea of producing a classification program that can evolve its ownclass definitions is called concept learning or induction.

Winston’s Learning Program• This program is an early structural concept learning program.

It operates in a simple blocks world domain. Its goal was toconstruct representations of the definitions of concepts inblocks domain.

• For example, it learned the concepts House, Tent and Arch.• A near miss is an object that is not an instance of the

concept in question but that is very similar to such instances.

Winston’s Learning Program

Figure shown above shows structural description of a house.

Winston’s Learning Program

Figure shown above shows structural description of a arch. Figure (b) describe that a brick is supported by two bricks. Figure (c) describe that a wedge is supported by two bricks.

Basic approach of Winston’s Program1. Begin with a structural description of one known instance of

the concept. Call that description the concept definition.

2. Examine descriptions of other known instances of theconcepts. Generalize the definition to include them.

3. Examine the descriptions of near misses of the concept.Restrict the definition to exclude these.

5. Explanation-Based Learning• Limitation of learning by example: Induction Learning requires a

substantial number of training instances for describing complexconcept .

• But human beings can learn quite a bit from single examples.Humans don’t need to see dozens of positive and negativeexamples of fork( chess) positions in order to learn to avoid thistrap in the future and perhaps use it to our advantage.

• What makes such single-example learning possible? The answeris knowledge. Much of the recent work in machine learning hasmoved away from the empirical, data intensive approach describedin the last section toward this more analytical knowledge intensiveapproach.

• A number of independent studies led to the characterization of thisapproach as explanation-base learning (EBL). An EBL systemattempts to learn from a single example x by explaining why x is anexample of the target concept. The explanation is thengeneralized, and then system’s performance is improved throughthe availability of this knowledge.

EBL cont…• We can think of EBL programs as accepting the following as input:

– A training example– A goal concept: A high level description of what the program is

supposed to learn– An operational criterion- A description of which concepts are usable.– A domain theory: A set of rules that describe relationships between

objects and actions in a domain.

• From this EBL computes a generalization of the training example that issufficient to describe the goal concept, and also satisfies theoperationality criterion.

• Explanation-based generalization (EBG) is an algorithm for EBL and hastwo steps: (1) explain, (2) generalize

• During the explanation step, the domain theory is used to prune away allthe unimportant aspects of the training example with respect to the goalconcept. What is left is an explanation of why the training example is aninstance of the goal concept. This explanation is expressed in terms thatsatisfy the operationality criterion. The next step is to generalize theexplanation as far as possible while still describing the goal concept.

5.1.6 : Discovery• Learning is the process by which one entity acquires

knowledge. Usually that knowledge is already possessed bysome number of other entities who may serve as teachers.

• Discovery is a restricted form of learning in which oneentity acquires knowledge without the help of a teacher.

• Various types are as follows:5.1.6.a Theory-Driven Discovery5.1.6.b Data Driven Discovery5.1.6.c Clustering

5.1.6.a : Theory-driven Discovery (AM)• Discovery is certainly a type of learning. More clearly we can say it is type

of problem solving. Suppose that we want to build a program to discoverthings in maths, such a program would have to rely heavily on theproblem-solving techniques.

• AM is written by Lenat and it worked from a few basic concepts of settheory to discover a good deal of standard number theory.

• AM exploited a variety of general-purpose AI techniques. It used a framesystem to represent mathematical concepts. One of the major activitiesof AM is to create new concepts and fill in their slots.

• AM uses Heuristic search, guided by a set of 250 heuristic rulesrepresenting hints about activities that are likely to lead to “interesting”discoveries.

• In one run AM discovered concept of prime numbers. How did it do it– Having stumbled onto the natural numbers, AM explored operations

such as addition, multiplication and their inverses. It created theconcept of divisibility and noticed that some numbers had very fewdivisors.

5.1.6.b Data Driven Discovery (BACON)• AM showed how discovery might occur in theoretical setting. This type of

scientific discovery has inspired several computer models. BACON is one ofsuch models.

• Langley presented a model of data-driven scientific discovery that has beenimplemented as a program called BACON ( named after Sir Francis Bacon,a philosopher of science). BACON begins with a set of variables for aproblem.

• Example: In the study of the behavior of gases, some variables are p (pressure), V (volume), n (amount in moles) and T (temperature). A law,called Ideal Gas Law, relates these variables. BACON can derive this lawon its own.

• First, BACON holds the variables n and T constant, performing experimentsat different pressures p1, p2 and p3. BACON notices that as the pressureincreases, the volume V decreases. For all values, n,p, V and T, pV/nT =8.32 which is ideal gas law as shown by BACON.

• BACON also discover wide variety of scientific laws such as Kepler’s thirdlaw, Ohm’s law, the conservation of momentum and Joule’s law.

• Limitation: Much more work must be done in areas of science that BACONdoes not model.

5.1.6.c : Clustering• Clustering is very similar to induction. In Inductive learning a

program learns to classify objects based on the labelingprovided by a teacher,

• In clustering, no class labeling are provided. The programmust discover for itself the natural classes that exist for theobjects, in addition to a method for classifying instances.

• AUTOCLASS is one program that accepts a number oftraining cases and hypothesizes a set of classes. For anygiven case, the program provides a set of probabilities thatpredict into which classes the case is likely to fall.

• In one application, AUTOCLASS found meaningful newclasses of stars from their infrared spectral data. This was aninstance of true discovery by computer, since the facts itdiscovered were previously unknown to astronomy.AUTOCLASS uses statistical Bayesian reasoning of thetype discussed.

5.1.7 : Analogy• Analogy is a powerful inference tool. Our language and reasoning

are full with analogies.1. Last month, the stock market was a roller coaster.2. Virat Kohli was like a fire engine in last match.3. Mamta Banarjee filp flops on railway budget.

• To learn above examples, a complicated mapping is done betweenwhat appear to be dissimilar concepts. In example 1, to understand,it is necessary to do two things:1. Pick out one key property of a roller coaster, namely that it

travels up and down rapidly.2. Realize that physical travel is itself an analogy for numerical

fluctuations of stock market.

• Limitation: This is no easy trick. The space of possible analogies isvery large. An AI program that is unable to grasp analogy will bedifficult to talk to and consequently difficult to teach. Thus analogicalreasoning is an important factor in learning by advice taking.Humans often solve problems by making analogies to things theyalready understand how to do.

5.1.7.a : Transformational Analogy

• Suppose we are asked to prove a theorem in a plane geography. We look for a previous theorem that is very similar and copy its proof, make its substitution where required.

• Here idea is to transform a solution of a previous problem into solution of current problem.

5.1.7.b : Derivational Analogy

• The transformational analogy does not look at how the old problemwas solved. It only looks at final solution of old problem.

• Often the internal details of old solution are also relevant to solve thenew problem. The detailed history of a problem solving episode iscalled derivation.

• Analogy which use these derivation in their new problem solvingprocess, is referred as Derivational Analogy.

5.2 Formal Learning Theory *• Learning has attracted the attention of mathematicians and

theoretical computer scientists. Inductive learning inparticular has received considerable attention.

• Formally, a device learns a concept if it can given positiveand negative examples, produces and algorithm that willclassify future examples correctly with probability 1/h.

• Complexity of learning a concept is function of 3 factors:1. error tolerance (h)2. number of binary features present in the examples (t)3. size of the rule necessary to make the discrimination (f)

• If the number of training examples required is polynomial inh, t, and f, then the concept is said to be learnable.

Formal Learning Theory cont..• Example: Given positive and negative examples of strings in

some regular language, can we efficiently induce the finiteautomation that produces all and only the strings in thelanguage? The answer is no; an exponential number ofcomputational steps is required.

• It is difficult to tell how such mathematical studies of learningwill affect the ways in which we solve AI problems in practice.

• After all, people are able to solve many exponentially hardproblems by using knowledge to constrain the space ofpossible solutions.

• Perhaps mathematical theory will one day be used to quantifythe use of such knowledge but this prospect seems far off.

Formal Learning Theory cont..

Gray ? Mammal ? Large? Vegetarian ? Wild ?

+ + + + + + Elephant

+ + + - + + Elephant

+ + - + + - Mouse

- + + + + - Giraffe

+ - + - + - Dinosaur

+ + + + - + Elephant

• 3 positive and 3 negative example of concept elephant are shown below:

5.3 Neural Network *Benefits of NN:• Pattern recognition, learning, classification, generalization and

abstraction, and interpretation of incomplete and noisy inputs• Character, speech and visual recognition can be done.• Can provide some human problem-solving characteristics.• Can tackle new kinds of problems.• Robust and Fast.• Flexible and easy to maintain.• Powerful hybrid systems.Limitation of NN:• Do not do well at tasks that are not done well by people• Lack explanation capabilities• Limitations and expense of hardware technology restrict most applications

to software simulations• Training time can be excessive and tedious• Usually requires large amounts of training and test data

5.4 Commonsense Reasoning

5.5 Expert Systems **• Definition: Expert system is a specific type of AI system that

solve such problem which are usually solved by humanexperts.

• Humans need to acquire some special training to solveexpert level problems. Example of expert level problems arearchitecture design, engineering task, medical diagnosis etc.

• To solve expert level problem, an ES require following things:1. Access to domain specific knowledge base2. Exploit one or more reasoning mechanism3. Mechanism for explanation

Architecture of Expert Systems

Components of Expert Systems

1. Knowledge Acquisition Subsystem

2. Knowledge Base3. Inference Engine4. User Interface5. Blackboard (Workplace)6. Explanation Subsystem (Justifier)7. Knowledge Refining System 8. User

User Interface

Knowledge Base

Inference Engine

1. Knowledge Acquisition Subsystem• Knowledge acquisition is the accumulation, transfer and

transformation of problem-solving expertise from experts and/ordocumented knowledge sources to a computer program forconstructing or expanding the knowledge base

• Requires a knowledge engineer

2. Knowledge Base• The knowledge base contains the knowledge necessary for

understanding, formulating, and solving problems• Two Basic Knowledge Base Elements

– Facts– Special heuristics, or rules that direct the use of knowledge

• Knowledge is the primary raw material of ES• Incorporated knowledge representation

3. Inference Engine:• The brain of the ES • The control structure (rule interpreter)• Provides methodology for reasoning• Major Elements are as follows:1. Interpreter2. Scheduler3. Consistency Enforcer

4. User Interface:• Language processor for friendly, problem-oriented

communication • Consists of menus and graphics.

5. Blackboard (Workplace) :• Area of working memory to

1. Describe the current problem2. Record Intermediate results

• Records Intermediate Hypotheses and Decisions1. Plan2. Agenda3. Solution

6. Explanation Subsystem (Justifier):• Traces responsibility and explains the ES behavior by interactively

answering questions Why? How? What? (Where? When? Who?)• Knowledge Refining System • Learning for improving performance

Human Element in Expert Systems1. Expert: Expert has the special knowledge, judgment, experience and

methods to give advice and solve problems. He provides knowledgeabout task performance.

2. Knowledge Engineer: KE Helps the expert, structure the problem area byinterpreting and integrating human answers to questions, drawinganalogies, posing counterexamples, and bringing to light conceptualdifficulties. KE is usually the system builder.

3. User: Possible Classes of Users• A non-expert client seeking direct advice (ES acts as a Consultant )• A student who wants to learn (ES acts as a Instructor)• An ES builder improving or increasing the knowledge base (Partner)• An expert (ES acts as a Colleague or Assistant)The Expert and the Knowledge Engineer Should Anticipate Users'Needs and Limitations When Designing ES.

4. Others: System Builder, Systems Analyst, Tool Builder, Vendors,Support Staff, Network Expert

ES Shell• Includes All Generic ES Components • But No Knowledge

– Example: EMYCIN from MYCIN– (E = Empty)

Expert Systems Benefits:1. Increased Output and

Productivity2. Enhancement of Problem

Solving and Decision Making3. Improved Decision Quality4. Increased Process and Product

Quality5. Flexibility6. Easier Equipment Operation7. Operation in Hazardous

Environments8. Integration of Several Experts'

Opinions9. Can Work with Incomplete or

Uncertain Information

Limitations of Expert Systems:1. ES work well only in a narrow

domain of knowledge.2. Lack of trust by end-users3. Knowledge is not always readily

available. Expertise can be hard to extract from humans.

4. Each expert’s approach may be different, yet correct

5. Expert system users have natural cognitive limits

6. Knowledge engineers are rare and expensive

7. ES may not be able to arrive at valid conclusions

Thanks

6CS6.2 Unit 5 Learning

Documents

Transcript of 6CS6.2 Unit 5 Learning