to Machine Learning Support Vector Machines · 2018-03-22 · « IntroducTon to Machine Learning »...

/107

AntoineCornuéjols

AgroParisTech–INRAMIA518

[email protected]

An Introduction

to Machine Learning

/107

•  8classes(days):coursesandexercises

–  09/12:(AC)Introduc*ontoML;Ar*ficialNeuralnetworks

–  09/14:(AC)Howtoevaluatelearning;Representa*onLearning;SupportVectorMachines;

–  09/20:(CM)Uncertain*esandprobabili*esDynamicBayesianNetworksandHiddenMarkovModels

–  09/23:(CM)DynamicBayesiannetworksandHiddenMarkovModels.

–  09/30:(xx)Ensemblemethods:Boos*ng,BaggingandrandomforestsUnsupervisedlearning;clusteringalgorithms

–  10/04:(xx)Changeofrepresenta*on,dimensionalityreduc*onUnsupervisedlearning:frequentitemssets,associa*onrules

–  10/11:(CM)Learningbayesiannetworks

–  10/18:(CM)PRM:Probabilis*cRela*onalModels

•  Exercisesinclass+homeworks

•  Documents

–  Theslides+informaTon+exercises+homeworkson:

hUp://www.agroparistech.fr/ufr-info/membres/cornuejols/Teaching/Erasmus-IT4BI/Cours-DMML-IT4BI.html

2«IntroducTontoMachineLearning»(A.Cornuéjols)

OrganizaTonofthecourse

/107

1.   IntroducTontoMachineLearning

2.   Conceptlearning:a1stapproach

3.   LearningwithAr6ficialNeuralNetworks

3

Outlinefortoday

«IntroducTontoMachineLearning»(A.Cornuéjols) /1074«IntroducTontoMachineLearning»(A.Cornuéjols)

Introduction to

Machine Learning

/107

To learn?

5«IntroducTontoMachineLearning»(A.Cornuéjols) /1076«IntroducTontoMachineLearning»(A.Cornuéjols)

/107

«Howcanwebuildcomputersystemsthat

automaTcallyimprovewithexperience,

and

whatarethefundamentallawsthat

governalllearningprocesses?»

TomMitchell,2006

MachineLearningasseenbyapioneer

7«IntroducTontoMachineLearning»(A.Cornuéjols) /107

MachineLearning

•  Scienceofautomated(aided)modeling

–  SearchfortheunderlyingregulariTesintheworldofobservaTons

–  SearchforamodeloftheworldthatallowsonetomakepredicTon

andtakedecisions

•  Scienceofadap6vesystems

❏  Reinforcementlearning

❏  SimulatedevoluTon


/107

Whatislearning?

Looking for a model of the world

from observations

in order to make predictions and to understand


Whatislearning?

Changes in a system that allows it to realize the

same type of tasks than during training

with a ever better performance


/107

MoTvaTon

•  Conceptsdifficulttohand-code

–  Permissiblemovesforarobot

–  Persontorecruit/ornot

–  PredisposiTonsforcertaintypesofcancer


Learningfromexamples

/107

Identifying regularities


/107

IllustraTon

13«IntroducTontoMachineLearning»(A.Cornuéjols) /107 2- Modélisations

2.1 Types de modèles : Modèles constructifs


/107

Data -> patterns and predictions


Adaptation

•  Association

•  Imitation

•  Behavioral learning:

–  Learning to walk (Brooks’s « insects »)

–  Learning to act on an unknown planet

•  Learning to play

–  Adapt to the adversary

–  Learning to not repeat past faults

–  Learning to play within a team

•  Teams of robots 16«IntroducTontoMachineLearning»(A.Cornuéjols)

/107

Illustration : Grand DARPA challenge (2005)




/107


150 mile off-road robot race across the Mojave desert Natural and manmade hazards No driver, no remote control No dynamic passing Fastest vehicle wins the race (and 2 million dollar prize)




/107

Illustration

•  Systèmes autonomes avec apprentissage


Theinputs


/107

Supervised Induction


Identifier Gender Age Education level

Maried Nb of children

Salary Profession To prospect?

I_21 M 43 Master Y 3 55,000 Architect YES

I_34 M 25 Sophomore N 0 21,000 Nurse NO

I_38 F 34 PhD Y 2 35,000 Univ. Prof. YES

I_39 F 67 Bachelor Y 5 20,000 Retired NO

I_58 F 56 Technical studies

Y 4 27,000 Employee NO

I_73 M 40 Graduate N 2 31,000 Salesman YES

I_81 F 51 Master Y 3 75,000 CEO YES

Data: organization and types


/107

Data: organization and types













Example

(instance)Descriptor

AUribute

(feature)

label

/107

Typesofinputs

§  Vectors

§  Sequences

§  Structured

§  Temporal

§  SpaTal


/107

Typesofinputs

•  Vectors

•  Sequences

•  Structured

•  Temporal

•  SpaTal













!"#$%&'()!"#$%"&'*(

+',-./%01.(23./450'()('%$)*'*(

+%,'+(

/107

Typesofinputs

•  Vectors

•  Sequences

•  Structured

•  Temporal

•  SpaTal

Protéine«sp|P00004|CYC_HORSE»is

acTvatedby…

1UcagUgtgaatgaatggacgtgccaaatagacgtgccgccgccgctcgaUcgcacU

61tgcUtcggtUtgccgtcgUtcacgcgtUagUccgttcggUcaUcccagUcU

121aaataccggacgtaaaaatacactctaacggtcccgcgaagaaaaagataaagacatctc

181gtagaaatattaaaataaattcctaaagtcgUggUtctcgUcacUtcgctgcctgc

…

4021agaacacgccgaggctccattcatagcaccacUcgtcgtcUaatcccctccctcatcc

4081gccatggcggtgcaaaaaataaaaagaactc

DEVICE=eth0

BOOTPROTO=none

ONBOOT=yes

IPADDR=192.168.0.X

NETMASK=255.255.255.O

GETEWAY=192.168.0.254

searchexemple.comnamserver

192.168.0.254


/107

Typesofinputs

•  Vectors

•  Sequences

•  Structured

•  Temporal

•  SpaTal

1storderlogic

block(B1)&ontable(B2)&above(B1,B2)&…


Typesofinputs

•  Vectors

•  Sequences

•  Structured

•  Temporal

•  SpaTal


/107

Typesofinputs

•  Vectors

•  Sequences

•  Structured

•  Temporal

•  Spa6al


Formats

Numerical: continuous (R) Bankaccount:12,915.86€

Numerical discreet (N ou Z) Numberofdependents:11

Binary Single:True

Category Colorin{rouge,vert,bleu}

Text Protein«sp|P00004|CYC_HORSE»is

acTvatedby…

Structured data Tree,XMLexpression,…

Sequences -Genome

-Sequenceofqueriesonawebsite

Images, videos

«IntroducTontoMachineLearning»(A.Cornuéjols)

/107

Threemaintypesoflearning


1. Supervisedlearning

FromalearningsetS ={(xi,ui)}1,m

findaunderlyingdependency

•  IntheformofafuncTonhascloseaspossibletof(targetfuncTon)

st.:ui=f(xi)

•  OrintheformofaprobabilitydistribuTonP(xi,ui)

inordertomakepredic6onsaboutfuture(unknown)x


Types of learning tasks

/107

Supervised learning

•  A learning set

S = {(x1, y1), (x2, y2), … , (xi , yi), … , (xm, ym)}

35

f

h

•  Prediction for new examples x –h-> y ?

«IntroducTontoMachineLearning»(A.Cornuéjols) /107

Examples

•  Learn to diagnose a disease

–  x = description of the patient (symptoms, results of examinations, …)

–  y = disease (or recommanded therapy)

•  Part-of-Speech tagging

❏  x = a sentence (e.g. « a star was born »)

❏  y = Role of the words in the sentence

•  Face recognition

❏  x = bitmap view of the face

❏  y = name of the person (or emotion: frightful, angry, …) 36«IntroducTontoMachineLearning»(A.Cornuéjols)

/107

Supervised learning


Supervised learning

•  A learning set

S = {(x1, y1), (x2, y2), … , (xi , yi), … , (xm, ym)}

38

f

h

•  Prediction for new examples x –h-> y ?

«IntroducTontoMachineLearning»(A.Cornuéjols)

/107

•  Iffisacon*nuousfunc*on–  Regression

–  DensityesTmaTon

•  Iffisadiscretefunc*on–  ClassificaTon

•  Iffisabinaryfunc*on(Boolean)–  Conceptlearning


Supervised learning

/107

•  Discrimination

–  One can predict that

–  clients

–  Adding up international calls for more than 300€/month

–  and who have made more than 3 reclamations in the past

–  Are likely to change for another provider

•  Regression

–  The number of accidents declared by a driver is

–  inversely proportional to the duration of its driver’s license,

–  with coefficients that are specific to each gender.

Supervised learning


/107

2. Unsupervisedlearning FromthelearningsetS={(xi)}1,m

findunderlyingregulari6esofthewholeset

–  E.g.undertheformofclusters(e.g.mixtureofGaussians)

–  CLUSTERING

inordertosummarize,suggesTngreulariTes,understand…



Action Perception

3. Reinforcementlearning

Environnement

Récompense

The learning data are

❏  A sequence of perceptions, actions and rewards (st,at,rt)t=1, ∞

–  With a reinforcement rt

–  That can be related to past actions made far before t

The problem: infer a function

perceived situation →action

so as to maximise a gain over long term

Akin to learning reflexes



/107

AliUlebitofprac6cing


•  Given two examples:

–  E1 : A striped triangle above a plain black square

–  E2 : A plain white square above a striped circle

➠  Give a general description of these two examples

E1 E2

A

B

C

D

An example


/107

Description Your answer True answer

1 large red square -

•  The examples are described using the attributes:

–  number(1 or 2); size (small or large); form (circle or square); color (red or green)

•  The objects belong to either class “+” or class “-”

1largegreensquare

2smallredsquares

2largeredcircles

1largegreencircle

1smallredcircle

1smallgreensquare

1smallredsquare

2largegreensquares

+

+

+

-

+

+

+

-

Yet another exercise


•  We want to learn an unknown function from examples

x1 x2 x3 x4

UnkownFct:fy=f(x1,x2,x3,x4)

Example x1 x2 x3 x4 Output

1 0 0 1 0 0

2 0 1 0 0 0

3 0 0 1 1 1

4 1 0 0 1 1

5 0 1 1 0 0

6 1 1 0 0 0

7 0 1 0 1 0

Can you guess

the function?


Another learning problem

/107

•  How many possible functions with 4 boolean inputs and one boolean ouptut?

Exemple x1 x2 x3 x4 Etiquette

1 0 0 0 0 ?

2 0 0 0 1 ?

3 0 0 1 0 0

4 0 0 1 1 1

5 0 1 0 0 0

6 0 1 0 1 0

7 0 1 1 0 0

8 0 1 1 1 ?

9 1 0 0 0 ?

10 1 0 0 1 1

11 1 0 1 0 ?

12 1 0 1 1 ?

13 1 1 0 0 0

14 1 1 0 1 ?

15 1 1 1 0 ?

16 1 1 1 1 ?

•  How many functions remain after 7 examples have been given?

Is learning possible?



/107

•  Look for a simple rule: in the form of conjuncts •  16 possible functions (without negations)

Example x1 x2 x3 x4 Output

1 0 0 1 0 0

2 0 1 0 0 0

3 0 0 1 1 1

4 1 0 0 1 1

5 0 1 1 0 0

6 1 1 0 0 0

7 0 1 0 1 0

Rule Conter-example x1 (6) 1 1 0 0 0 x2 (2) 0 1 0 0 0 x3 (5) 0 1 1 0 0 x4 (7) 0 1 0 1 1 x1 Λ x2 (6) 1 1 0 0 0 x1 Λ x3 (3) 0 0 1 1 1 x1 Λ x4 (3) 0 0 1 1 1 x2 Λ x3 (3) 0 0 1 1 1 x2 Λ x4 (3) 0 0 1 1 1 x3 Λ x4 (4) 1 0 0 1 1 x1 Λ x2 Λ x3 (3) 0 0 1 1 1

Rule Conter-example x1 Λ x2 Λ x4 (3) 0 0 1 1 1 x1 Λ x3 Λ x4 (3) 0 0 1 1 1 x2 Λ x3 Λ x4 (3) 0 0 1 1 1 x1 Λ x2 Λ x3 Λ x4 (3) 0 0 1 1 1

None of these rules is correct 48«IntroducTontoMachineLearning»(A.Cornuéjols)


/107

•  Look for a simple rule: in the form of m of n •  20 possible functions Example x1 x2 x3 x4 Output

1 0 0 1 0 0

2 0 1 0 0 0

3 0 0 1 1 1

4 1 0 0 1 1

5 0 1 1 0 0

6 1 1 0 0 0

7 0 1 0 1 0

Rule Conter-example {x1} 3{x2} 2{x3} 1{x4} 7{x1,x2} 2 3{x1,x3} 1 3{x1,x4} 6 3{x2,x3} 2 3{x2,x4} 2 3{x3,x4} 4 4{x1,x2,x3} 1 3 3

Rule Conter-example {x1,x2,x4} 2 3 3{x1,x3,x4} 1 **

*3

{x2,x3,x4} 1 5 3{x1, x2,x3,x4} 1 5 3 3

We found a rule! 49«IntroducTontoMachineLearning»(A.Cornuéjols)


/107

•  Induction

–  1 2 3 5 …

–  1 1 1 2 1 1 2 1 1 1 1 1 2 2 1 3 1 2 2 1 1 …

–  (1) (1 1) (2 1) (1 2 1 1) (1 1 1 2 2 1) (3 1 2 2 1 1)

–  How?

–  Why would that be possible to make inductions?

–  Should an additional example increase confidence in the proposed rule?

–  How many examples?


What about predicting sequences?

/10751«IntroducTontoMachineLearning»(A.Cornuéjols)

Supervised concept learning

A 1st approach

/107

•  Induction of a binary function (with values in {-1,+1})


Supervisedconceptlearning:example

/107

•  Methodbynearestneighbours

•  Necessityofano*onofdistance

+ +++

+ ++

--

-

-

-

-

--

-

Examplespace: X

+/-?

➠  Assumption of continuity in X53«IntroducTontoMachineLearning»(A.Cornuéjols)

Learn→makepredicToninX

/107

•  E.g.conceptlearning

Examplespace:X Hypothesisspace:H

LH

+ +++

+ ++

--

-

--

-

--

-x h


Supervisedlearning=agamebetweenspaces

/107

+ +++

+ ++

--

-

--

-

--

-

LH

x h

X H


Tryingtoapproximatethe«targetconcept»

/107

➥ How to control the exploration ofH?

+ +++

+ ++

--

-

--

-

--

-

LH

x h

X H

x hxh? ?


Conceptlearningassearch

/107

Howtocorrectafaultyhypothesis?

++

+

+ +

+

+

0

000

0

0

0

0

X

hm

Nouvel exemple : (xm+1 ,-1)

hm+1+

++

+ +

+

+

0

0

0

0

0

0

0

X

hm

Nouvel exemple : (xm+1 ,+1)

hm+1+

(a) (b)

ModificaTonofthecurrenthypothesis


•  h1:completebutincorrect

•  h2:correctbutincomplete

•  h3:completeandcorrect:

consistant

++

+

+ +

+

+

0

0

00

0

0

0

0

X

h1

h2h3

Hypotheses and examples



H is structured

/107

Supervised concept learning

The Version Space algorithm


/107

Towards generalization

++ +

+ ++

+

0

00

0

0

0

0

Xcouverture(ht)

ht

ht+1

H

+

couverture(ht+1)


InclusioninXandrelaTonofgeneralityinH

/107

The generality relation in H induced by the inclusion relation in X

X

h1

h2

H

couverture(h3)

couverture(h2)

h3

couverture(h1)


Thegeneral-to-specificorderinginducedinH

/107

•  h1:completebutincorrect

•  h2:correctbutincomplete

•  h3:completeandcorrect:

consistent

++

+

+ +

+

+

0

0

00

0

0

0

0

X

h1

h2h3

Hypotheses and examples


Partial ordering in H

hi hj

gms(hi, hj)

smg(hi, hj)

H64«IntroducTontoMachineLearning»(A.Cornuéjols)

AgeneralizaTonlawceinH


Operators to explore H

/107

•  GeneralizaTon–  TransformadescripTonintoamoregeneraldescripTon

•  SpecializaTon–  InverseofthegeneralizaTon

–  (Ingeneral:producesadescripTonwhichisalogicalconsequenceoftheiniTaldescripTon)

•  ReformulaTon

–  TransformadescripTonintoanew,logicallyequivalent,descripTon


Theoperators

/107

•  Rule: remove a conjunct

–  A&B→C => A→C

ferrari&red→costly => ferrari→costly

•  Rule: add an alternative

–  A→C => A ∨ B→C

ferrari→costly => ferrari∨red→costly

•  Rule: extend the range of a descriptor

–  A&[B=R]→C=>A&[B=R’]→C

large&[color=red]→costly

=>large&[color=red∨ blue]→costly


GeneralizaTonoperators

/107

•  Rule: climbing in the hierarchy of the descriptors

–  A&[B=n1]→C&&A&[B=n2]→C=>A&[B=N]→C

corrosive&[element=chlorine]→toxic

corrosive&[element=bromine]→toxic

=>corrosive&[element=halogen]→toxic

Halogen

BromineChlorine


GeneralizaTonoperators


The version space can be defined by two bounds

/107

Fundamental observation:

The version space structured by a partial order relation can be defined by:

–  An upper bound: the G-set

–  A lower bound: the S-set

•  G-set = Set of the more general hypotheses that are consistant with the training examples

•  S-set = Set of the more specific hypotheses that are consistant with the training examples

H

G

S

hi hj


RepresenTngtheversionspace


The candidate elimination learning algorithm

/107

Learning...

…bysuccessiveupda*ngsoftheversionspace

Idea:

maintaintheS-set

andtheG-set

a|ereachnewexample

➥  Candidateelimina*onalgorithm


/107

Initialize S and G respectively by:

–  The set of the most specific (general) hypotheses consistent with the 1st

positive example provided.

For each new example (positive or negative)

–  update S

–  update G

Until convergence

or if S = G = Ø


CandidateeliminaTonlearningalgorithm

/107

•  xi is negative

–  Eliminate the hypotheses of S erroneously covering xi

•  xi is positive

–  Generalize minimally the hypotheses of S that do not cover xi so that

they cover it

–  Then eliminate the hypotheses of S

•  Covering one or more negative examples

•  And/or that are more general that hypotheses of S


UpdaTngS

/107

•  xi is positive

–  Eliminate the hypotheses of G that do not cover xi

•  xi is negative

–  Specialize minimally the hypotheses of G that cover xi so that the new

hypotheses do not cover it

–  The eliminate the hypotheses of G

•  That are not more general than at least one hypothesis of S

•  And/or that are more specific than at least one other hypothesis of G


UpdaTngG

/107

Updating the bounds S and G

H

G

Sx

x

x

x(a)

(b)

(c)

(d)

x(d')

(b')

(a')

xx

x


CandidateeliminaTonlearningalgorithm

/107

Exercise


•  Incremental

•  Complexity?

•  Howtousetheresultifconvergencedoesnotoccur?

•  WhatdoesS = G = Ømean?

•  CouldthealgorithmbeusedinanacTvelearningmode?

•  Whattodoifthedataisnoisy?


ProperTesofthealgorithm


Illustration: LEX

/107

Résolutionde problèmes Généralisation

Critique

Générationde problèmes

Exercice

Trace détaillée de latentative de résolution

de l'exercice

Heuristiquespartiellement

apprises

Exempled'apprentissage


IllustraTon:LEX(1)

/107

Résolutionde problèmes Généralisation

Critique

Générationde problèmes

Calculer la primitive de :∫ 3x cos(x) dx

∫ 3x cos(x) dx

3x sin(x) - ∫ 3x sin(x) dx

3x sin(x) - 3 ∫ x sin(x) dx

3x sin(x) - 3x cos(x) dx + C

OP2 avec :u = 3xdv = cos(x) dx

OP1

OP5

Un des exemples positifs proposés :

∫ 3x cos(x) dx→ Appliquer OP2 avec :

u = 3x dv = cos(x) dx

Espace des versions pour l'utilisation del'opérateur OP2 :

S ={ ∫ 3x cos(x) dx → Appliquer OP2avec : u = 3x

dv = cos(x) dx}G ={ ∫ f1(x) f2(x) dx → Appliquer OP2

avec : u = f1(x) dv = f2(x) dx}


IllustraTon:LEX(2)

/107

Things that we left aside



How to code the inputs

/107

•  Yes

•  Yes

•  No84«IntroducTontoMachineLearning»(A.Cornuéjols)

Learningiseasywhenweknowwhattolookfor

/107

•  IsitapaUernrecogniTontask?AcharacterrecogniTontask?…

•  Howtocodetheexamples?

0 1 1 1 1 1 1 0 1 1 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 0 1 1 1 1 1 1 1 1 1 0 1 1 1 1 0 0 1 1 1 0

•  A right choice of representation can render the learning task trivial

➠  But how can we know the right representation?


Inputsandpriorknowledge

/107

Alearningpuzzle

f = −1

f = +1

f = ?


/107

Conclusions


Conclusions

•  Severaltypesoflearningtasks–  Supervised;unsupervised;reinforcementlearning

–  Butalsosemi-supervisedlearning,ranking,on-linelearning,…

•  Supervisedlearning=searchinahypothesisspace–  Howtocodetheinputs?

–  HowtochoosethehypothesisspaceH?

–  HowtoexploreH?


/107

Waitaminute…


AmIrighttochoosethehypothesis

asIdid?


/107

Supposewechoosetolearn“rectangles”

•  Howtochoosearectangle?


x

y

/107




x

y

Mostgeneral

hypotheses

/107




x

y

Mostspecific

hypothesis

/107


•  Howtolearn?–  Choiceofanhypothesish


VersionSpace

/107


•  Howtolearn?Choiceofh–  Whichperformance?


x

y

h

/107

AliVlebitoftheoryaboutsupervisedinducTon

•  Whatistheexpectedperformanceofh?

–  WhichdoesnotmakeerrorsonthetrainingsetS


x

y

h

The“empiricalrisk”

R(h) =1m

m�

i=1

��h(xi), yi

�

/107


•  Learningstrategy:–  Choiceofanhypothesisofnullempiricalrisk(noerroronthelearning

setS)

–  Whatistheexpectedperformanceforh?


x

y

h

x

y

f

h

/107


–  Choiceofanhypothesisofnullempiricalrisk(noerroronthelearning

setS)

–  Whatistheexpectedperformanceforh?

–  WhatistheriskofhavinganerrorR(h)>ε?


x

y

f

h

h � f

x

y

f

h

/107


Whichperformance?

•  CostofapredicTonerror–  Thelossfunc7on

•  WhichexpectedcostifIchooseh?

–  Expectedcost:the“realrisk”


R(h) =�

X�Y��h(x), y

�pXY(x, y) dx dy

��h(x), y

�

/107


•  Letsupposethathtq.

•  Whatistheprobabilitythathhadbeenchosennonetheless?


x

y

f

h

h � f

R(h) � �

R(h) = pX (h � f)

A|eroneexample: p�R(h

�= 0) � 1� �

A|ermexamples(i.i.d.):

pm�R(h

�= 0) � (1� �)m

Wewant: � ⇥, � � [0, 1] : pm�R(h

�� ⇥) � �

“falls”outside h � f

/107


•  Wewant:


x

y

f

h

h � f

Thatis:

Hence:

� ⇥, � � [0, 1] : pm�R(h

�� ⇥) � �

(1 � �)m � �

e�� m � �

�⇥ m � ln(�)

m � ln(1/�)⇥

TrueforONEhypothesis

/107


•  ThehypothesisischosenwithinHwithnullerroronS

•  Wewant:


•  Wesuppose:

Then:

|H| < �

«realisablecase»

� ⇥, � � [0, 1] : pm��h : R(h

�� ⇥) � �

m � 1⇥

ln|H|�

�⇥ m ⇥ ln(�) � ln(|H|)

|H| (1 � ⇥)m ⇥ |H| e�� m = �

TrueforthebesthypothesisinH

/107


•  ThehypothesisischosenwithinHwithlowesterroronS


«notrealisablecase»

Samekindofanalysis,butmoreinvolved.

[Vapnik,1982,1989]

/107

Boundsbetweentherealriskandtheempiricalrisk

•  Hfinite,realisablecase

•  Hfinite,nonrealisablecase


⌅h ⇤ H,⌅� ⇥ 1 : Pm

�RReel(h) ⇥ REmp(h) +

�log |H|+ log 1

�

2 m

�> 1� �

⌅h ⇤ H,⌅� ⇥ 1 : Pm


log |H|+ log 1�

m

�> 1� �

/107


•  nonrealisablecaseandHnotfinite

Howtoproceed?

–  Generalapproach:

1.   Reducethestudyoftheinfinitecasetotheanalysisofanfinitesetofhypotheses

2.   Measurehowmuchitispossible,foranytrainingsetSoflabeledpoints,to

findanhypothesisinHthatcanfitS



•  MeasureusingtheVapnik-Chervonenkisdimension

–  Purelycombinatoricmeasure,whichdoesnotdependonthenumberof

examples

–  SizeofthelargestsetofpointspointsthatcanbelabelledinanywaybythehypothesesinH

Onecancomputethebound:

dV C(H) = max�m : �H(m) = 2m

�

⌅h ⇤ H,⌅� ⇥ 1 : Pm


�8 dV C(H) log 2 e m

dV C(H) + 8 log 4�

m

�> 1� �


/107

VCdim:illustraTon

•  dVC(linearseparators)=?

+

+ -

+

+

--

+

+

-

+

+

(a) (b) (c)

•  dVC(rectangles)=?

+

+

-- +

+

-

++

+

-

+

+

-

(a) (b) (c) (d)

+


to Machine Learning Support Vector Machines · 2018-03-22 · « IntroducTon to Machine Learning »...

Documents

Transcript of to Machine Learning Support Vector Machines · 2018-03-22 · « IntroducTon to Machine Learning »...