chapter11.ppt

55
Machine Learning Chapter 11

Transcript of chapter11.ppt

  • Machine LearningChapter 11

  • Machine LearningWhat is learning?

  • Machine LearningWhat is learning?

    That is what learning is. You suddenly understand something you've understood all your life, but in a new way.(Doris Lessing 2007 Nobel Prize in Literature)

  • Machine LearningHow to construct programs that automatically improve with experience.

  • Machine LearningHow to construct programs that automatically improve with experience.

    Learning problem:Task TPerformance measure PTraining experience E

  • Machine LearningChess game:Task T: playing chess gamesPerformance measure P: percent of games won against opponentsTraining experience E: playing practice games againts itself

  • Machine LearningHandwriting recognition:Task T: recognizing and classifying handwritten wordsPerformance measure P: percent of words correctly classified Training experience E: handwritten words with given classifications

  • Designing a Learning SystemChoosing the training experience:Direct or indirect feedbackDegree of learner's control Representative distribution of examples

  • Designing a Learning SystemChoosing the target function:Type of knowledge to be learnedFunction approximation

  • Designing a Learning SystemChoosing a representation for the target function:Expressive representation for a close function approximationSimple representation for simple training data and learning algorithms

  • Designing a Learning SystemChoosing a function approximation algorithm (learning algorithm)

  • Designing a Learning SystemChess game:Task T: playing chess gamesPerformance measure P: percent of games won against opponentsTraining experience E: playing practice games againts itselfTarget function: V: Board R

  • Designing a Learning SystemChess game:Target function representation:V^(b) = w0 + w1x1 + w2x2 + w3x3 + w4x4 + w5x5 + w6x6x1: the number of black pieces on the boardx2: the number of red pieces on the boardx3: the number of black kings on the boardx4: the number of red kings on the boardx5: the number of black pieces threatened by redx6: the number of red pieces threatened by black

  • Designing a Learning SystemChess game:Function approximation algorithm:(, 100)x1: the number of black pieces on the boardx2: the number of red pieces on the boardx3: the number of black kings on the boardx4: the number of red kings on the boardx5: the number of black pieces threatened by redx6: the number of red pieces threatened by black

  • Designing a Learning SystemWhat is learning?

  • Designing a Learning SystemLearning is an (endless) generalization or induction process.

  • Designing a Learning SystemExperimentGeneratorPerformanceSystemGeneralizerCriticNew problem(initial board)Solution trace(game history)Hypothesis(V^)Training examples{(b1, V1), (b2, V2), ...}

  • Issues in Machine LearningWhat learning algorithms to be used?How much training data is sufficient?When and how prior knowledge can guide the learning process?What is the best strategy for choosing a next training experience?What is the best way to reduce the learning task to one or more function approximation problems?How can the learner automatically alter its representation to improve its learning ability?

  • ExampleExperiencePredictionLowWeak

    5RainyColdHighStrongWarmChange?6SunnyWarmNormalStrongWarmSame?7SunnyWarmLowStrongCoolSame?

  • ExampleLearning problem:Task T: classifying days on which my friend enjoys water sportPerformance measure P: percent of days correctly classifiedTraining experience E: days with given attributes and classifications

  • Concept LearningInferring a boolean-valued function from training examples of its input (instances) and output (classifications).

  • Concept LearningLearning problem:Target concept: a subset of the set of instances X c: X {0, 1}Target function: Sky AirTemp Humidity Wind Water Forecast {Yes, No}Hypothesis: Characteristics of all instances of the concept to be learned Constraints on instance attributesh: X {0, 1}

  • Concept LearningSatisfaction: h(x) = 1 iff x satisfies all the constraints of hh(x) = 0 otherwsie

    Consistency: h(x) = c(x) for every instance x of the training examples

    Correctness: h(x) = c(x) for every instance x of X

  • Concept LearningHow to represent a hypothesis function?

  • Concept LearningHypothesis representation (constraints on instance attributes):

    ?: any value is acceptablesingle required value: no value is acceptable

  • Concept LearningGeneral-to-specific ordering of hypotheses: hj g hk iff xX: hk(x) = 1 hj(x) = 1

    SpecificGeneralh1 = h2 = h3 = HLattice(Partial order)h1h3h2

  • FIND-Sh = < , , , , , >h = h = h =

  • FIND-SInitialize h to the most specific hypothesis in H: For each positive training instance x:For each attribute constraint ai in h:If the constraint is not satisfied by x Then replace ai by the next more general constraint satisfied by xOutput hypothesis h

  • FIND-Sh = Prediction

    5RainyColdHighStrongWarmChangeNo6SunnyWarmNormalStrongWarmSameYes7SunnyWarmLowStrongCoolSameYes

  • FIND-SThe output hypothesis is the most specific one that satisfies all positive training examples.

  • FIND-SThe result is consistent with the positive training examples.

  • FIND-SIs the result is consistent with the negative training examples?

  • FIND-Sh =

  • FIND-SThe result is consistent with the negative training examples if the target concept is contained in H (and the training examples are correct).

  • FIND-SThe result is consistent with the negative training examples if the target concept is contained in H (and the training examples are correct).Sizes of the space:Size of the instance space: |X| = 3.2.2.2.2.2 = 96 Size of the concept space C = 2|X| = 296Size of the hypothesis space H = (4.3.3.3.3.3) + 1 = 973
  • FIND-SQuestions: Has the learner converged to the target concept, as there can be several consistent hypotheses (with both positive and negative training examples)?Why the most specific hypothesis is preferred?What if there are several maximally specific consistent hypotheses?What if the training examples are not correct?

  • List-then-Eliminate AlgorithmVersion space: a set of all hypotheses that are consistent with the training examples.Algorithm:Initial version space = set containing every hypothesis in H For each training example , remove from the version space any hypothesis h for which h(x) c(x) Output the hypotheses in the version space

  • List-then-Eliminate AlgorithmRequires an exhaustive enumeration of all hypotheses in H

  • Compact Representation of Version SpaceG (the generic boundary): set of the most generic hypotheses of H consistent with the training data D:G = {gH | consistent(g, D) gH: g g g consistent(g, D)}

    S (the specific boundary): set of the most specific hypotheses of H consistent with the training data D:S = {sH | consistent(s, D) sH: s g s consistent(s, D)}

  • Compact Representation of Version SpaceVersion space = = {hH | gG sS: g g h g s} SG

  • Candidate-Elimination AlgorithmS0 = {}G0 = {}S1 = {}G1 = {}S2 = {}G2 = {}S3 = {}G3 = {, , }G4 = {, }

    SG

    ExampleSkyAirTempHumidityWindWaterForecastEnjoySport1SunnyWarmNormalStrongWarmSameYes2SunnyWarmHighStrongWarmSameYes3RainyColdHighStrongWarmChangeNo4SunnyWarmHighStrongCoolChangeYes

  • Candidate-Elimination AlgorithmS4 = {}

    G4 = {, }

  • Candidate-Elimination AlgorithmInitialize G to the set of maximally general hypotheses in HInitialize S to the set of maximally specific hypotheses in H

  • Candidate-Elimination AlgorithmFor each positive example d:Remove from G any hypothesis inconsistent with d For each s in S that is inconsistent with d: Remove s from SAdd to S all least generalizations h of s, such that h is consistent with d and some hypothesis in G is more general than hRemove from S any hypothesis that is more general than another hypothesis in S

  • Candidate-Elimination AlgorithmFor each negative example d:Remove from S any hypothesis inconsistent with d For each g in G that is inconsistent with d: Remove g from GAdd to G all least specializations h of g, such that h is consistent with d and some hypothesis in S is more specific than hRemove from G any hypothesis that is more specific than another hypothesis in G

  • Candidate-Elimination AlgorithmThe version space will converge toward the correct target concepts if:H contains the correct target conceptThere are no errors in the training examples

    A training instance to be requested next should discriminate among the alternative hypotheses in the current version space:

  • Candidate-Elimination AlgorithmPartially learned concept can be used to classify new instances using the majority rule.S4 = {}

    G4 = {, }

    5RainyWarmHighStrongCoolSame?

  • Inductive BiasSize of the instance space: |X| = 3.2.2.2.2.2 = 96 Number of possible concepts = 2|X| = 296Size of H = (4.3.3.3.3.3) + 1 = 973
  • Inductive BiasSize of the instance space: |X| = 3.2.2.2.2.2 = 96 Number of possible concepts = 2|X| = 296Size of H = (4.3.3.3.3.3) + 1 = 973
  • Inductive BiasAn unbiased hypothesis space H that can represent every subset of the instance space X: Propositional logic sentencesPositive examples: x1, x2, x3Negative examples: x4, x5

    h(x) (x = x1) (x = x2) (x = x3) x1 x2 x3

    h(x) (x x4) (x x5) x4 x5

  • Inductive Biasx1 x2 x3 x6x1 x2 x3x4 x5Any new instance x is classified positive by half of the version space, and negative by the other half not classifiable

  • Inductive Bias

  • Inductive Bias

    3GoodHigh?4BadLow?

  • Inductive BiasA learner that makes no prior assumptions regarding the identity of the target concept cannot classify any unseen instances.

  • HomeworkExercises2-1 2.5 (Chapter 2, ML textbook)