to Machine Learning Support Vector Machines · 2018-03-22 · « IntroducTon to Machine Learning »...
Transcript of to Machine Learning Support Vector Machines · 2018-03-22 · « IntroducTon to Machine Learning »...
/107
AntoineCornuéjols
AgroParisTech–INRAMIA518
An Introduction
to Machine Learning
/107
• 8classes(days):coursesandexercises
– 09/12:(AC)Introduc*ontoML;Ar*ficialNeuralnetworks
– 09/14:(AC)Howtoevaluatelearning;Representa*onLearning;SupportVectorMachines;
– 09/20:(CM)Uncertain*esandprobabili*esDynamicBayesianNetworksandHiddenMarkovModels
– 09/23:(CM)DynamicBayesiannetworksandHiddenMarkovModels.
– 09/30:(xx)Ensemblemethods:Boos*ng,BaggingandrandomforestsUnsupervisedlearning;clusteringalgorithms
– 10/04:(xx)Changeofrepresenta*on,dimensionalityreduc*onUnsupervisedlearning:frequentitemssets,associa*onrules
– 10/11:(CM)Learningbayesiannetworks
– 10/18:(CM)PRM:Probabilis*cRela*onalModels
• Exercisesinclass+homeworks
• Documents
– Theslides+informaTon+exercises+homeworkson:
hUp://www.agroparistech.fr/ufr-info/membres/cornuejols/Teaching/Erasmus-IT4BI/Cours-DMML-IT4BI.html
2«IntroducTontoMachineLearning»(A.Cornuéjols)
OrganizaTonofthecourse
/107
1. IntroducTontoMachineLearning
2. Conceptlearning:a1stapproach
3. LearningwithAr6ficialNeuralNetworks
3
Outlinefortoday
«IntroducTontoMachineLearning»(A.Cornuéjols) /1074«IntroducTontoMachineLearning»(A.Cornuéjols)
Introduction to
Machine Learning
/107
To learn?
5«IntroducTontoMachineLearning»(A.Cornuéjols) /1076«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
«Howcanwebuildcomputersystemsthat
automaTcallyimprovewithexperience,
and
whatarethefundamentallawsthat
governalllearningprocesses?»
TomMitchell,2006
MachineLearningasseenbyapioneer
7«IntroducTontoMachineLearning»(A.Cornuéjols) /107
MachineLearning
• Scienceofautomated(aided)modeling
– SearchfortheunderlyingregulariTesintheworldofobservaTons
– SearchforamodeloftheworldthatallowsonetomakepredicTon
andtakedecisions
• Scienceofadap6vesystems
❏ Reinforcementlearning
❏ SimulatedevoluTon
8«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Whatislearning?
Looking for a model of the world
from observations
in order to make predictions and to understand
9«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Whatislearning?
Changes in a system that allows it to realize the
same type of tasks than during training
with a ever better performance
10«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
MoTvaTon
• Conceptsdifficulttohand-code
– Permissiblemovesforarobot
– Persontorecruit/ornot
– PredisposiTonsforcertaintypesofcancer
11«IntroducTontoMachineLearning»(A.Cornuéjols)
Learningfromexamples
/107
Identifying regularities
12«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
IllustraTon
13«IntroducTontoMachineLearning»(A.Cornuéjols) /107 2- Modélisations
2.1 Types de modèles : Modèles constructifs
14«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Data -> patterns and predictions
15«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Adaptation
• Association
• Imitation
• Behavioral learning:
– Learning to walk (Brooks’s « insects »)
– Learning to act on an unknown planet
• Learning to play
– Adapt to the adversary
– Learning to not repeat past faults
– Learning to play within a team
• Teams of robots 16«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Illustration : Grand DARPA challenge (2005)
17«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Illustration : Grand DARPA challenge (2005)
18«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Illustration : Grand DARPA challenge (2005)
150 mile off-road robot race across the Mojave desert Natural and manmade hazards No driver, no remote control No dynamic passing Fastest vehicle wins the race (and 2 million dollar prize)
19«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Illustration : Grand DARPA challenge (2005)
20«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Illustration
• Systèmes autonomes avec apprentissage
21«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Theinputs
22«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Supervised Induction
23«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Identifier Gender Age Education level
Maried Nb of children
Salary Profession To prospect?
I_21 M 43 Master Y 3 55,000 Architect YES
I_34 M 25 Sophomore N 0 21,000 Nurse NO
I_38 F 34 PhD Y 2 35,000 Univ. Prof. YES
I_39 F 67 Bachelor Y 5 20,000 Retired NO
I_58 F 56 Technical studies
Y 4 27,000 Employee NO
I_73 M 40 Graduate N 2 31,000 Salesman YES
I_81 F 51 Master Y 3 75,000 CEO YES
Data: organization and types
24«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Data: organization and types
25«IntroducTontoMachineLearning»(A.Cornuéjols)
Identifier Gender Age Education level
Maried Nb of children
Salary Profession To prospect?
I_21 M 43 Master Y 3 55,000 Architect YES
I_34 M 25 Sophomore N 0 21,000 Nurse NO
I_38 F 34 PhD Y 2 35,000 Univ. Prof. YES
I_39 F 67 Bachelor Y 5 20,000 Retired NO
I_58 F 56 Technical studies
Y 4 27,000 Employee NO
I_73 M 40 Graduate N 2 31,000 Salesman YES
I_81 F 51 Master Y 3 75,000 CEO YES
Example
(instance)Descriptor
AUribute
(feature)
label
/107
Typesofinputs
§ Vectors
§ Sequences
§ Structured
§ Temporal
§ SpaTal
26«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Typesofinputs
• Vectors
• Sequences
• Structured
• Temporal
• SpaTal
27«IntroducTontoMachineLearning»(A.Cornuéjols)
Identifier Gender Age Education level
Maried Nb of children
Salary Profession To prospect?
I_21 M 43 Master Y 3 55,000 Architect YES
I_34 M 25 Sophomore N 0 21,000 Nurse NO
I_38 F 34 PhD Y 2 35,000 Univ. Prof. YES
I_39 F 67 Bachelor Y 5 20,000 Retired NO
I_58 F 56 Technical studies
Y 4 27,000 Employee NO
I_73 M 40 Graduate N 2 31,000 Salesman YES
I_81 F 51 Master Y 3 75,000 CEO YES
!"#$%&'()!"#$%"&'*(
+',-./%01.(23./450'()('%$)*'*(
+%,'+(
/107
Typesofinputs
• Vectors
• Sequences
• Structured
• Temporal
• SpaTal
Protéine«sp|P00004|CYC_HORSE»is
acTvatedby…
1UcagUgtgaatgaatggacgtgccaaatagacgtgccgccgccgctcgaUcgcacU
61tgcUtcggtUtgccgtcgUtcacgcgtUagUccgttcggUcaUcccagUcU
121aaataccggacgtaaaaatacactctaacggtcccgcgaagaaaaagataaagacatctc
181gtagaaatattaaaataaattcctaaagtcgUggUtctcgUcacUtcgctgcctgc
…
4021agaacacgccgaggctccattcatagcaccacUcgtcgtcUaatcccctccctcatcc
4081gccatggcggtgcaaaaaataaaaagaactc
DEVICE=eth0
BOOTPROTO=none
ONBOOT=yes
IPADDR=192.168.0.X
NETMASK=255.255.255.O
GETEWAY=192.168.0.254
searchexemple.comnamserver
192.168.0.254
28«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Typesofinputs
• Vectors
• Sequences
• Structured
• Temporal
• SpaTal
1storderlogic
block(B1)&ontable(B2)&above(B1,B2)&…
29«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Typesofinputs
• Vectors
• Sequences
• Structured
• Temporal
• SpaTal
30«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Typesofinputs
• Vectors
• Sequences
• Structured
• Temporal
• Spa6al
31«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Formats
Numerical: continuous (R) Bankaccount:12,915.86€
Numerical discreet (N ou Z) Numberofdependents:11
Binary Single:True
Category Colorin{rouge,vert,bleu}
Text Protein«sp|P00004|CYC_HORSE»is
acTvatedby…
Structured data Tree,XMLexpression,…
Sequences -Genome
-Sequenceofqueriesonawebsite
Images, videos
«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Threemaintypesoflearning
33«IntroducTontoMachineLearning»(A.Cornuéjols) /107
1. Supervisedlearning
FromalearningsetS ={(xi,ui)}1,m
findaunderlyingdependency
• IntheformofafuncTonhascloseaspossibletof(targetfuncTon)
st.:ui=f(xi)
• OrintheformofaprobabilitydistribuTonP(xi,ui)
inordertomakepredic6onsaboutfuture(unknown)x
34«IntroducTontoMachineLearning»(A.Cornuéjols)
Types of learning tasks
/107
Supervised learning
• A learning set
S = {(x1, y1), (x2, y2), … , (xi , yi), … , (xm, ym)}
35
f
h
• Prediction for new examples x –h-> y ?
«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Examples
• Learn to diagnose a disease
– x = description of the patient (symptoms, results of examinations, …)
– y = disease (or recommanded therapy)
• Part-of-Speech tagging
❏ x = a sentence (e.g. « a star was born »)
❏ y = Role of the words in the sentence
• Face recognition
❏ x = bitmap view of the face
❏ y = name of the person (or emotion: frightful, angry, …) 36«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Supervised learning
37«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Supervised learning
• A learning set
S = {(x1, y1), (x2, y2), … , (xi , yi), … , (xm, ym)}
38
f
h
• Prediction for new examples x –h-> y ?
«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
• Iffisacon*nuousfunc*on– Regression
– DensityesTmaTon
• Iffisadiscretefunc*on– ClassificaTon
• Iffisabinaryfunc*on(Boolean)– Conceptlearning
39«IntroducTontoMachineLearning»(A.Cornuéjols)
Supervised learning
/107
• Discrimination
– One can predict that
– clients
– Adding up international calls for more than 300€/month
– and who have made more than 3 reclamations in the past
– Are likely to change for another provider
• Regression
– The number of accidents declared by a driver is
– inversely proportional to the duration of its driver’s license,
– with coefficients that are specific to each gender.
Supervised learning
40«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
2. Unsupervisedlearning FromthelearningsetS={(xi)}1,m
findunderlyingregulari6esofthewholeset
– E.g.undertheformofclusters(e.g.mixtureofGaussians)
– CLUSTERING
inordertosummarize,suggesTngreulariTes,understand…
Types of learning tasks
41«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Action Perception
3. Reinforcementlearning
Environnement
Récompense
The learning data are
❏ A sequence of perceptions, actions and rewards (st,at,rt)t=1, ∞
– With a reinforcement rt
– That can be related to past actions made far before t
The problem: infer a function
perceived situation →action
so as to maximise a gain over long term
Akin to learning reflexes
Types of learning tasks
42«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
AliUlebitofprac6cing
43«IntroducTontoMachineLearning»(A.Cornuéjols) /107
• Given two examples:
– E1 : A striped triangle above a plain black square
– E2 : A plain white square above a striped circle
➠ Give a general description of these two examples
E1 E2
A
B
C
D
An example
44«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Description Your answer True answer
1 large red square -
• The examples are described using the attributes:
– number(1 or 2); size (small or large); form (circle or square); color (red or green)
• The objects belong to either class “+” or class “-”
1largegreensquare
2smallredsquares
2largeredcircles
1largegreencircle
1smallredcircle
1smallgreensquare
1smallredsquare
2largegreensquares
+
+
+
-
+
+
+
-
Yet another exercise
45«IntroducTontoMachineLearning»(A.Cornuéjols) /107
• We want to learn an unknown function from examples
x1 x2 x3 x4
UnkownFct:fy=f(x1,x2,x3,x4)
Example x1 x2 x3 x4 Output
1 0 0 1 0 0
2 0 1 0 0 0
3 0 0 1 1 1
4 1 0 0 1 1
5 0 1 1 0 0
6 1 1 0 0 0
7 0 1 0 1 0
Can you guess
the function?
46«IntroducTontoMachineLearning»(A.Cornuéjols)
Another learning problem
/107
• How many possible functions with 4 boolean inputs and one boolean ouptut?
Exemple x1 x2 x3 x4 Etiquette
1 0 0 0 0 ?
2 0 0 0 1 ?
3 0 0 1 0 0
4 0 0 1 1 1
5 0 1 0 0 0
6 0 1 0 1 0
7 0 1 1 0 0
8 0 1 1 1 ?
9 1 0 0 0 ?
10 1 0 0 1 1
11 1 0 1 0 ?
12 1 0 1 1 ?
13 1 1 0 0 0
14 1 1 0 1 ?
15 1 1 1 0 ?
16 1 1 1 1 ?
• How many functions remain after 7 examples have been given?
Is learning possible?
47«IntroducTontoMachineLearning»(A.Cornuéjols)
Another learning problem
/107
• Look for a simple rule: in the form of conjuncts • 16 possible functions (without negations)
Example x1 x2 x3 x4 Output
1 0 0 1 0 0
2 0 1 0 0 0
3 0 0 1 1 1
4 1 0 0 1 1
5 0 1 1 0 0
6 1 1 0 0 0
7 0 1 0 1 0
Rule Conter-example x1 (6) 1 1 0 0 0 x2 (2) 0 1 0 0 0 x3 (5) 0 1 1 0 0 x4 (7) 0 1 0 1 1 x1 Λ x2 (6) 1 1 0 0 0 x1 Λ x3 (3) 0 0 1 1 1 x1 Λ x4 (3) 0 0 1 1 1 x2 Λ x3 (3) 0 0 1 1 1 x2 Λ x4 (3) 0 0 1 1 1 x3 Λ x4 (4) 1 0 0 1 1 x1 Λ x2 Λ x3 (3) 0 0 1 1 1
Rule Conter-example x1 Λ x2 Λ x4 (3) 0 0 1 1 1 x1 Λ x3 Λ x4 (3) 0 0 1 1 1 x2 Λ x3 Λ x4 (3) 0 0 1 1 1 x1 Λ x2 Λ x3 Λ x4 (3) 0 0 1 1 1
None of these rules is correct 48«IntroducTontoMachineLearning»(A.Cornuéjols)
Another learning problem
/107
• Look for a simple rule: in the form of m of n • 20 possible functions Example x1 x2 x3 x4 Output
1 0 0 1 0 0
2 0 1 0 0 0
3 0 0 1 1 1
4 1 0 0 1 1
5 0 1 1 0 0
6 1 1 0 0 0
7 0 1 0 1 0
Rule Conter-example {x1} 3{x2} 2{x3} 1{x4} 7{x1,x2} 2 3{x1,x3} 1 3{x1,x4} 6 3{x2,x3} 2 3{x2,x4} 2 3{x3,x4} 4 4{x1,x2,x3} 1 3 3
Rule Conter-example {x1,x2,x4} 2 3 3{x1,x3,x4} 1 **
*3
{x2,x3,x4} 1 5 3{x1, x2,x3,x4} 1 5 3 3
We found a rule! 49«IntroducTontoMachineLearning»(A.Cornuéjols)
Another learning problem
/107
• Induction
– 1 2 3 5 …
– 1 1 1 2 1 1 2 1 1 1 1 1 2 2 1 3 1 2 2 1 1 …
– (1) (1 1) (2 1) (1 2 1 1) (1 1 1 2 2 1) (3 1 2 2 1 1)
– How?
– Why would that be possible to make inductions?
– Should an additional example increase confidence in the proposed rule?
– How many examples?
50«IntroducTontoMachineLearning»(A.Cornuéjols)
What about predicting sequences?
/10751«IntroducTontoMachineLearning»(A.Cornuéjols)
Supervised concept learning
A 1st approach
/107
• Induction of a binary function (with values in {-1,+1})
52«IntroducTontoMachineLearning»(A.Cornuéjols)
Supervisedconceptlearning:example
/107
• Methodbynearestneighbours
• Necessityofano*onofdistance
+ +++
+ ++
--
-
-
-
-
--
-
Examplespace: X
+/-?
➠ Assumption of continuity in X53«IntroducTontoMachineLearning»(A.Cornuéjols)
Learn→makepredicToninX
/107
• E.g.conceptlearning
Examplespace:X Hypothesisspace:H
LH
+ +++
+ ++
--
-
--
-
--
-x h
54«IntroducTontoMachineLearning»(A.Cornuéjols)
Supervisedlearning=agamebetweenspaces
/107
+ +++
+ ++
--
-
--
-
--
-
LH
x h
X H
55«IntroducTontoMachineLearning»(A.Cornuéjols)
Tryingtoapproximatethe«targetconcept»
/107
➥ How to control the exploration ofH?
+ +++
+ ++
--
-
--
-
--
-
LH
x h
X H
x hxh? ?
56«IntroducTontoMachineLearning»(A.Cornuéjols)
Conceptlearningassearch
/107
Howtocorrectafaultyhypothesis?
++
+
+ +
+
+
0
000
0
0
0
0
X
hm
Nouvel exemple : (xm+1 ,-1)
hm+1+
++
+ +
+
+
0
0
0
0
0
0
0
X
hm
Nouvel exemple : (xm+1 ,+1)
hm+1+
(a) (b)
ModificaTonofthecurrenthypothesis
57«IntroducTontoMachineLearning»(A.Cornuéjols) /107
• h1:completebutincorrect
• h2:correctbutincomplete
• h3:completeandcorrect:
consistant
++
+
+ +
+
+
0
0
00
0
0
0
0
X
h1
h2h3
Hypotheses and examples
58«IntroducTontoMachineLearning»(A.Cornuéjols)
/10759«IntroducTontoMachineLearning»(A.Cornuéjols)
H is structured
/107
Supervised concept learning
The Version Space algorithm
60«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Towards generalization
++ +
+ ++
+
0
00
0
0
0
0
Xcouverture(ht)
ht
ht+1
H
+
couverture(ht+1)
61«IntroducTontoMachineLearning»(A.Cornuéjols)
InclusioninXandrelaTonofgeneralityinH
/107
The generality relation in H induced by the inclusion relation in X
X
h1
h2
H
couverture(h3)
couverture(h2)
h3
couverture(h1)
62«IntroducTontoMachineLearning»(A.Cornuéjols)
Thegeneral-to-specificorderinginducedinH
/107
• h1:completebutincorrect
• h2:correctbutincomplete
• h3:completeandcorrect:
consistent
++
+
+ +
+
+
0
0
00
0
0
0
0
X
h1
h2h3
Hypotheses and examples
63«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Partial ordering in H
hi hj
gms(hi, hj)
smg(hi, hj)
H64«IntroducTontoMachineLearning»(A.Cornuéjols)
AgeneralizaTonlawceinH
/10765«IntroducTontoMachineLearning»(A.Cornuéjols)
Operators to explore H
/107
• GeneralizaTon– TransformadescripTonintoamoregeneraldescripTon
• SpecializaTon– InverseofthegeneralizaTon
– (Ingeneral:producesadescripTonwhichisalogicalconsequenceoftheiniTaldescripTon)
• ReformulaTon
– TransformadescripTonintoanew,logicallyequivalent,descripTon
66«IntroducTontoMachineLearning»(A.Cornuéjols)
Theoperators
/107
• Rule: remove a conjunct
– A&B→C => A→C
ferrari&red→costly => ferrari→costly
• Rule: add an alternative
– A→C => A ∨ B→C
ferrari→costly => ferrari∨red→costly
• Rule: extend the range of a descriptor
– A&[B=R]→C=>A&[B=R’]→C
large&[color=red]→costly
=>large&[color=red∨ blue]→costly
67«IntroducTontoMachineLearning»(A.Cornuéjols)
GeneralizaTonoperators
/107
• Rule: climbing in the hierarchy of the descriptors
– A&[B=n1]→C&&A&[B=n2]→C=>A&[B=N]→C
corrosive&[element=chlorine]→toxic
corrosive&[element=bromine]→toxic
=>corrosive&[element=halogen]→toxic
Halogen
BromineChlorine
68«IntroducTontoMachineLearning»(A.Cornuéjols)
GeneralizaTonoperators
/10769«IntroducTontoMachineLearning»(A.Cornuéjols)
The version space can be defined by two bounds
/107
Fundamental observation:
The version space structured by a partial order relation can be defined by:
– An upper bound: the G-set
– A lower bound: the S-set
• G-set = Set of the more general hypotheses that are consistant with the training examples
• S-set = Set of the more specific hypotheses that are consistant with the training examples
H
G
S
hi hj
70«IntroducTontoMachineLearning»(A.Cornuéjols)
RepresenTngtheversionspace
/10771«IntroducTontoMachineLearning»(A.Cornuéjols)
The candidate elimination learning algorithm
/107
Learning...
…bysuccessiveupda*ngsoftheversionspace
Idea:
maintaintheS-set
andtheG-set
a|ereachnewexample
➥ Candidateelimina*onalgorithm
72«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Initialize S and G respectively by:
– The set of the most specific (general) hypotheses consistent with the 1st
positive example provided.
For each new example (positive or negative)
– update S
– update G
Until convergence
or if S = G = Ø
73«IntroducTontoMachineLearning»(A.Cornuéjols)
CandidateeliminaTonlearningalgorithm
/107
• xi is negative
– Eliminate the hypotheses of S erroneously covering xi
• xi is positive
– Generalize minimally the hypotheses of S that do not cover xi so that
they cover it
– Then eliminate the hypotheses of S
• Covering one or more negative examples
• And/or that are more general that hypotheses of S
74«IntroducTontoMachineLearning»(A.Cornuéjols)
UpdaTngS
/107
• xi is positive
– Eliminate the hypotheses of G that do not cover xi
• xi is negative
– Specialize minimally the hypotheses of G that cover xi so that the new
hypotheses do not cover it
– The eliminate the hypotheses of G
• That are not more general than at least one hypothesis of S
• And/or that are more specific than at least one other hypothesis of G
75«IntroducTontoMachineLearning»(A.Cornuéjols)
UpdaTngG
/107
Updating the bounds S and G
H
G
Sx
x
x
x(a)
(b)
(c)
(d)
x(d')
(b')
(a')
xx
x
76«IntroducTontoMachineLearning»(A.Cornuéjols)
CandidateeliminaTonlearningalgorithm
/107
Exercise
77«IntroducTontoMachineLearning»(A.Cornuéjols) /107
• Incremental
• Complexity?
• Howtousetheresultifconvergencedoesnotoccur?
• WhatdoesS = G = Ømean?
• CouldthealgorithmbeusedinanacTvelearningmode?
• Whattodoifthedataisnoisy?
78«IntroducTontoMachineLearning»(A.Cornuéjols)
ProperTesofthealgorithm
/10779«IntroducTontoMachineLearning»(A.Cornuéjols)
Illustration: LEX
/107
Résolutionde problèmes Généralisation
Critique
Générationde problèmes
Exercice
Trace détaillée de latentative de résolution
de l'exercice
Heuristiquespartiellement
apprises
Exempled'apprentissage
80«IntroducTontoMachineLearning»(A.Cornuéjols)
IllustraTon:LEX(1)
/107
Résolutionde problèmes Généralisation
Critique
Générationde problèmes
Calculer la primitive de :∫ 3x cos(x) dx
∫ 3x cos(x) dx
3x sin(x) - ∫ 3x sin(x) dx
3x sin(x) - 3 ∫ x sin(x) dx
3x sin(x) - 3x cos(x) dx + C
OP2 avec :u = 3xdv = cos(x) dx
OP1
OP5
Un des exemples positifs proposés :
∫ 3x cos(x) dx→ Appliquer OP2 avec :
u = 3x dv = cos(x) dx
Espace des versions pour l'utilisation del'opérateur OP2 :
S ={ ∫ 3x cos(x) dx → Appliquer OP2avec : u = 3x
dv = cos(x) dx}G ={ ∫ f1(x) f2(x) dx → Appliquer OP2
avec : u = f1(x) dv = f2(x) dx}
81«IntroducTontoMachineLearning»(A.Cornuéjols)
IllustraTon:LEX(2)
/107
Things that we left aside
82«IntroducTontoMachineLearning»(A.Cornuéjols)
/10783«IntroducTontoMachineLearning»(A.Cornuéjols)
How to code the inputs
/107
• Yes
• Yes
• No84«IntroducTontoMachineLearning»(A.Cornuéjols)
Learningiseasywhenweknowwhattolookfor
/107
• IsitapaUernrecogniTontask?AcharacterrecogniTontask?…
• Howtocodetheexamples?
0 1 1 1 1 1 1 0 1 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0
• A right choice of representation can render the learning task trivial
➠ But how can we know the right representation?
85«IntroducTontoMachineLearning»(A.Cornuéjols)
Inputsandpriorknowledge
/107
Alearningpuzzle
f = −1
f = +1
f = ?
86«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Conclusions
87«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Conclusions
• Severaltypesoflearningtasks– Supervised;unsupervised;reinforcementlearning
– Butalsosemi-supervisedlearning,ranking,on-linelearning,…
• Supervisedlearning=searchinahypothesisspace– Howtocodetheinputs?
– HowtochoosethehypothesisspaceH?
– HowtoexploreH?
88«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Waitaminute…
89«IntroducTontoMachineLearning»(A.Cornuéjols) /107
AmIrighttochoosethehypothesis
asIdid?
90«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
Supposewechoosetolearn“rectangles”
• Howtochoosearectangle?
91«IntroducTontoMachineLearning»(A.Cornuéjols)
x
y
/107
Supposewechoosetolearn“rectangles”
• Howtochoosearectangle?
92«IntroducTontoMachineLearning»(A.Cornuéjols)
x
y
Mostgeneral
hypotheses
/107
Supposewechoosetolearn“rectangles”
• Howtochoosearectangle?
93«IntroducTontoMachineLearning»(A.Cornuéjols)
x
y
Mostspecific
hypothesis
/107
Supposewechoosetolearn“rectangles”
• Howtolearn?– Choiceofanhypothesish
94«IntroducTontoMachineLearning»(A.Cornuéjols)
VersionSpace
/107
Supposewechoosetolearn“rectangles”
• Howtolearn?Choiceofh– Whichperformance?
95«IntroducTontoMachineLearning»(A.Cornuéjols)
x
y
h
/107
AliVlebitoftheoryaboutsupervisedinducTon
• Whatistheexpectedperformanceofh?
– WhichdoesnotmakeerrorsonthetrainingsetS
96«IntroducTontoMachineLearning»(A.Cornuéjols)
x
y
h
The“empiricalrisk”
R(h) =1m
m�
i=1
��h(xi), yi
�
/107
AliVlebitoftheoryaboutsupervisedinducTon
• Learningstrategy:– Choiceofanhypothesisofnullempiricalrisk(noerroronthelearning
setS)
– Whatistheexpectedperformanceforh?
97«IntroducTontoMachineLearning»(A.Cornuéjols)
x
y
h
x
y
f
h
/107
AliVlebitoftheoryaboutsupervisedinducTon
– Choiceofanhypothesisofnullempiricalrisk(noerroronthelearning
setS)
– Whatistheexpectedperformanceforh?
– WhatistheriskofhavinganerrorR(h)>ε?
98«IntroducTontoMachineLearning»(A.Cornuéjols)
x
y
f
h
h � f
x
y
f
h
/107
AliVlebitoftheoryaboutsupervisedinducTon
Whichperformance?
• CostofapredicTonerror– Thelossfunc7on
• WhichexpectedcostifIchooseh?
– Expectedcost:the“realrisk”
99«IntroducTontoMachineLearning»(A.Cornuéjols)
R(h) =�
X�Y��h(x), y
�pXY(x, y) dx dy
��h(x), y
�
/107
AliVlebitoftheoryaboutsupervisedinducTon
• Letsupposethathtq.
• Whatistheprobabilitythathhadbeenchosennonetheless?
100«IntroducTontoMachineLearning»(A.Cornuéjols)
x
y
f
h
h � f
R(h) � �
R(h) = pX (h � f)
A|eroneexample: p�R(h
�= 0) � 1� �
A|ermexamples(i.i.d.):
pm�R(h
�= 0) � (1� �)m
Wewant: � ⇥, � � [0, 1] : pm�R(h
�� ⇥) � �
“falls”outside h � f
/107
AliVlebitoftheoryaboutsupervisedinducTon
• Wewant:
101«IntroducTontoMachineLearning»(A.Cornuéjols)
x
y
f
h
h � f
Thatis:
Hence:
� ⇥, � � [0, 1] : pm�R(h
�� ⇥) � �
(1 � �)m � �
e�� m � �
�⇥ m � ln(�)
m � ln(1/�)⇥
TrueforONEhypothesis
/107
AliVlebitoftheoryaboutsupervisedinducTon
• ThehypothesisischosenwithinHwithnullerroronS
• Wewant:
102«IntroducTontoMachineLearning»(A.Cornuéjols)
• Wesuppose:
Then:
|H| < �
«realisablecase»
� ⇥, � � [0, 1] : pm��h : R(h
�� ⇥) � �
m � 1⇥
ln|H|�
�⇥ m ⇥ ln(�) � ln(|H|)
|H| (1 � ⇥)m ⇥ |H| e�� m = �
TrueforthebesthypothesisinH
/107
AliVlebitoftheoryaboutsupervisedinducTon
• ThehypothesisischosenwithinHwithlowesterroronS
103«IntroducTontoMachineLearning»(A.Cornuéjols)
«notrealisablecase»
Samekindofanalysis,butmoreinvolved.
[Vapnik,1982,1989]
/107
Boundsbetweentherealriskandtheempiricalrisk
• Hfinite,realisablecase
• Hfinite,nonrealisablecase
104«IntroducTontoMachineLearning»(A.Cornuéjols)
⌅h ⇤ H,⌅� ⇥ 1 : Pm
�RReel(h) ⇥ REmp(h) +
�log |H|+ log 1
�
2 m
�> 1� �
⌅h ⇤ H,⌅� ⇥ 1 : Pm
�RReel(h) ⇥ REmp(h) +
log |H|+ log 1�
m
�> 1� �
/107
Boundsbetweentherealriskandtheempiricalrisk
• nonrealisablecaseandHnotfinite
Howtoproceed?
– Generalapproach:
1. Reducethestudyoftheinfinitecasetotheanalysisofanfinitesetofhypotheses
2. Measurehowmuchitispossible,foranytrainingsetSoflabeledpoints,to
findanhypothesisinHthatcanfitS
105«IntroducTontoMachineLearning»(A.Cornuéjols) /107
Boundsbetweentherealriskandtheempiricalrisk
• MeasureusingtheVapnik-Chervonenkisdimension
– Purelycombinatoricmeasure,whichdoesnotdependonthenumberof
examples
– SizeofthelargestsetofpointspointsthatcanbelabelledinanywaybythehypothesesinH
Onecancomputethebound:
dV C(H) = max�m : �H(m) = 2m
�
⌅h ⇤ H,⌅� ⇥ 1 : Pm
�RReel(h) ⇥ REmp(h) +
�8 dV C(H) log 2 e m
dV C(H) + 8 log 4�
m
�> 1� �
106«IntroducTontoMachineLearning»(A.Cornuéjols)
/107
VCdim:illustraTon
• dVC(linearseparators)=?
+
+ -
+
+
--
+
+
-
+
+
(a) (b) (c)
• dVC(rectangles)=?
+
+
-- +
+
-
++
+
-
+
+
-
(a) (b) (c) (d)
+
107«IntroducTontoMachineLearning»(A.Cornuéjols)