What is Learning? Machine Learning: Introduction and Unsupervised

22
1 Machine Learning: Introduc2on and Unsupervised Learning Chapter 18.1, 18.2, 18.8.1 and “Introduc2on to Sta2s2cal Machine LearningOptional: “A Few Useful Things to Know about Machine Learning,” P. Domingos, Comm. ACM 55, 2012 What is Learning? “Learning is making useful changes in our minds” Marvin Minsky “Learning is construc7ng or modifying representa7ons of what is being experienced“ – Ryszard Michalski “Learning denotes changes in a system that ... enable a system to do the same task more efficiently the next 7me” Herbert Simon Why do Machine Learning? Solve classifica2on problems Learn models of data (“data fiNng”) Understand and improve efficiency of human learning (e.g., Computer-Aided Instruc2on (CAI)) Discover new things or structures that are unknown to humans (“data mining”) Fill in skeletal or incomplete specifica2ons about a domain Major Paradigms of Machine Learning Rote Learning Induc2on Clustering Discovery Gene2c Algorithms Reinforcement Learning Transfer Learning Learning by Analogy Mul2-task Learning

Transcript of What is Learning? Machine Learning: Introduction and Unsupervised

Page 1: What is Learning? Machine Learning: Introduction and Unsupervised

1

MachineLearning:Introduc2onand

UnsupervisedLearning

Chapter18.1,18.2,18.8.1and“Introduc2ontoSta2s2calMachineLearning”

Optional: “A Few Useful Things to Know about Machine Learning,” P. Domingos, Comm. ACM 55, 2012

WhatisLearning?

•  “Learningismakingusefulchangesinourminds”–MarvinMinsky

•  “Learningisconstruc7ngormodifyingrepresenta7onsofwhatisbeingexperienced“–RyszardMichalski

•  “Learningdenoteschangesinasystemthat...enableasystemtodothesametaskmoreefficientlythenext7me”–HerbertSimon

WhydoMachineLearning?

•  Solveclassifica2onproblems•  Learnmodelsofdata(“datafiNng”)•  Understandandimproveefficiencyofhumanlearning(e.g.,Computer-AidedInstruc2on(CAI))

•  Discovernewthingsorstructuresthatareunknowntohumans(“datamining”)

•  Fillinskeletalorincompletespecifica2onsaboutadomain

MajorParadigmsofMachineLearning

•  RoteLearning•  Induc2on•  Clustering•  Discovery•  Gene2cAlgorithms•  ReinforcementLearning•  TransferLearning•  LearningbyAnalogy•  Mul2-taskLearning

Page 2: What is Learning? Machine Learning: Introduction and Unsupervised

2

Induc2veLearning

•  Generalizefromagivensetof(training)examplessothataccuratepredic2onscanbemadeaboutfutureexamples

•  Learnunknownfunc2on:f(x) = y– x:aninputexample(akainstance)– y:thedesiredoutput

•  Discreteorcon2nuousscalarvalue– h(hypothesis)func2onislearnedthatapproximatesf

Represen2ng“Things”inMachineLearning

•  Anexampleorinstance,x,representsaspecificobject(“thing”)

•  xo[enrepresentedbyaD-dimensionalfeaturevectorx=(x1,...,xD)∈RD

•  Eachdimensioniscalledafeatureora2ribute•  Con2nuousordiscrete•  xisapointintheD-dimensionalfeaturespace•  Abstrac2onofobject.Ignoresallotheraspects(e.g.,twopeoplehavingthesameweightandheightmaybeconsiderediden2cal)

FeatureVectorRepresenta2on•  Preprocessrawdata

–  extractafeature(a_ribute)vector,x, thatdescribesalla_ributesrelevantforanobject

•  Eachxisalistof(attribute, value)pairs x = [(Rank, queen), (Suit, hearts), (Size, big)]–  numberofa_ributesisfixed:Rank,Suit,Size–  numberofpossiblevaluesforeacha_ributeisfixed(ifdiscrete)Rank:2,…,10,jack,queen,king,aceSuit:diamonds,hearts,clubs,spadesSize:big,small

TypesofFeatures

•  Numericalfeaturehasdiscreteorcon2nuousvaluesthataremeasurements,e.g.,aperson’sweight

•  Categoricalfeatureisonethathastwoormorevalues(categories),butthereisnointrinsicorderingofthevalues,e.g.,aperson’sreligion(akaNominalfeature)

•  Ordinalfeatureissimilartoacategoricalfeaturebutthereisaclearorderingofthevalues,e.g.,economicstatus,withthreevalues:low,mediumandhigh

Page 3: What is Learning? Machine Learning: Introduction and Unsupervised

3

FeatureVectorRepresenta2on

EachexamplecanbeinterpretedasapointinaD-dimensionalfeaturespace,whereDisthenumberoffeatures/a_ributes

Suit

Rank

spades clubs hearts diamonds

2 4 6 8 10 J Q K

FeatureVectorRepresenta2onExample

•  Textdocument– VocabularyofsizeD(~100,000):aardvark,…,zulu

•  “bagofwords”:countsofeachvocabularyentry–  Tomarrymytrueloveè(3531:113788:119676:1)–  IwishthatIfindmysoulmatethisyearè(3819:113448:119450:1

20514:1)

•  O[enremove“stopwords”:the,of,at,in,…•  Special“out-of-vocabulary”(OOV)entrycatchesallunknownwords

MoreFeatureRepresenta2ons

•  Image–  Colorhistogram

•  So[ware–  Execu2onprofile:thenumberof2meseachlineisexecuted

•  Bankaccount–  Creditra2ng,balance,#depositsinlastday,week,month,year,#withdrawals,…

•  Bioinforma2cs– Medicaltest1,test2,test3,…

TrainingSet

•  Atrainingset(akatrainingsample)isacollec2onofexamples(akainstances),x1,...,xn,whichistheinputtothelearningprocess

•  xi=(xi1,...,xiD)•  Assumetheseinstancesareallsampledindependentlyfromthesame,unknown(popula2on)distribu2on,P(x)

•  Wedenotethisbyxi∼P(x),wherei.i.d.standsforindependentandiden:callydistributed

•  Example:Repeatedthrowsofdice

i.i.d.

Page 4: What is Learning? Machine Learning: Introduction and Unsupervised

4

TrainingSet

•  Atrainingsetisthe“experience”giventoalearningalgorithm

•  Whatthealgorithmcanlearnfromitvaries•  Twobasiclearningparadigms:

– unsupervisedlearning– supervisedlearning

Induc2veLearning

•  Supervisedvs.UnsupervisedLearning– supervised:"teacher"givesasetof(x,y)pairs– unsupervised:onlythex’saregiven

•  Ineithercase,thegoalistoes2matef sothatitgeneralizeswellto“correctly”dealwith“futureexamples”incompu2ngf(x)=y– Thatis,findfthatminimizessomemeasureoftheerroroverasetofsamples

UnsupervisedLearning•  Trainingsetisx1,...,xn,that’sit!•  No“teacher” providingsupervisionastohowindividualexamplesshouldbehandled

•  Commontasks:–  Clustering:separatethenexamplesintogroups– Discovery:findhiddenorunknownpa_erns– Noveltydetec:on:findexamplesthatareverydifferentfromtherest

– Dimensionalityreduc:on:representeachexamplewithalowerdimensionalfeaturevectorwhilemaintainingkeycharacteris2csofthetrainingsamples

Clustering

•  Goal:Grouptrainingsampleintoclusterssuchthatexamplesinthesameclusteraresimilar,andexamplesindifferentclustersaredifferent

•  Howmanyclustersdoyousee?•  Manyclusteringalgorithms

Page 5: What is Learning? Machine Learning: Introduction and Unsupervised

5

OrangesandLemons

(fromIainMurrayh_p://homepages.inf.ed.ac.uk/imurray2/)

GoogleNews

DigitalPhotoCollec2ons

•  Youhave1000sofdigitalphotosstoredinvariousfolders

•  Organizethembe_erbygroupingintoclusters– Simplestidea:useimagecrea2on2me(EXIFtag)– Morecomplicated:extractimagefeatures

Histogram-BasedImageSegmenta2on

•  Goal:SegmenttheimageintoKregions– ReducethenumberofgraylevelstoKandmapeachpixeltotheclosestgraylevel

Page 6: What is Learning? Machine Learning: Introduction and Unsupervised

6

Histogram-BasedImageSegmenta2on

•  Goal:SegmenttheimageintoKregions– ReducethenumberofgraylevelstoKandmapeachpixeltotheclosestgraylevel

Detec2ngEventsonTwi_er

•  Usereal-2metextandimagesfromtweetstodiscovernewsocialevents

•  Clustersdefinedbysimilarwordsandwordcooccurences,plussimilarimagefeatures

Google’sEmbeddingProjectorProject ThreeFrequentlyUsedClusteringMethods

•  HierarchicalAgglomera:veClustering– Buildabinarytreeoverthedatasetbyrepeatedlymergingclusters

•  K-MeansClustering– Specifythedesirednumberofclustersanduseanitera2vealgorithmtofindthem

•  MeanShiDClustering

Page 7: What is Learning? Machine Learning: Introduction and Unsupervised

7

HierarchicalAgglomera:veClustering

•  Ini2allyeverypointisinitsowncluster

HierarchicalAgglomera:veClustering•  Findthepairofclustersthataretheclosest

HierarchicalAgglomera:veClustering• Mergethetwointoasinglecluster

HierarchicalAgglomera:veClustering•  Repeat…

Page 8: What is Learning? Machine Learning: Introduction and Unsupervised

8

HierarchicalAgglomera:veClustering•  Repeat…

HierarchicalAgglomera:veClustering•  Repeat…un2lthewholedatasetisonegiantcluster•  Yougetabinarytree(notshownhere)

HierarchicalAgglomera:veClusteringAlgorithm

HierarchicalAgglomera:veClustering

Howdoyoumeasuretheclosenessbetweentwoclusters?Atleastthreeways:

– Single-linkage:theshortestdistancefromanymemberofoneclustertoanymemberoftheothercluster

– Complete-linkage:thelargestdistancefromanymemberofoneclustertoanymemberoftheothercluster

– Average-linkage:theaveragedistancebetweenallpairsofmembers,onefromeachcluster

Page 9: What is Learning? Machine Learning: Introduction and Unsupervised

9

Distance•  Howtomeasurethedistancebetweenapairofexamples,X=(x1,…,xn)andY=(y1,…,yn)?– Euclidean

– Manha_an/City-Block– Hamming

•  Numberoffeaturesthataredifferentbetweenthetwoexamples

– Andmanyothers

d(X,Y) = xi − yi( )2i∑

d(X,Y) = xi − yii∑

HierarchicalAgglomera:veClustering

•  Thebinarytreeyougetiso[encalledadendrogram,ortaxonomy,orahierarchyofdatapoints

•  Thetreecanbecutatanyleveltoproducedifferentnumbersofclusters:ifyouwantkclusters,justcutthe(k-1)longestlinks

•  6Italianci2es•  Single-linkage

Example created by Matteo Matteucci

HierarchicalAgglomera:veClusteringExample

Itera2on1:MergeMIandTO

Recompute min distance from MI/TO cluster to all other cities

Page 10: What is Learning? Machine Learning: Introduction and Unsupervised

10

Itera2on2:MergeNAandRM Itera2on3:MergeBAandNA/RM

Itera2on4:MergeFIandBA/NA/RM FinalDendrogram

Page 11: What is Learning? Machine Learning: Introduction and Unsupervised

11

WhatFactorsAffecttheOutcomeofHierarchicalAgglomera:veClustering?•  Featuresused•  Rangeofvaluesforeachfeature•  Linkagemethod•  Distancemetricused•  Weightofeachfeature•  …

HierarchicalAgglomera:veClusteringApplet

h_p://home.dei.polimi.it/ma_eucc/Clustering/tutorial_html/AppletH.html

ThreeFrequentlyUsedClusteringMethods

•  HierarchicalAgglomera:veClustering– Buildabinarytreeoverthedataset

•  K-MeansClustering– Specifythedesirednumberofclustersanduseanitera2vealgorithmtofindthem

•  MeanShiDClustering

•  SupposeItellyoutheclustercenters,ci–  Q:Howtodeterminewhichpointstoassociatewitheachci?

K-MeansClustering

–  A:Foreachpointx,chooseclosestci

•  SupposeItellyouthepointsineachcluster–  Q:Howtodeterminetheclustercenters?–  A:Choosecitobethemean/centroidofallpointsinthecluster

Page 12: What is Learning? Machine Learning: Introduction and Unsupervised

12

K-MeansClustering•  Thedataset.Inputk=5

K-MeansClustering•  Randomlypick5

posi2onsasini2alclustercenters(notnecessarilydatapoints)

K-MeansClustering•  Eachpointfinds

whichclustercenteritisclosestto;thepointbelongstothatcluster

K-MeansClustering•  Eachcluster

computesitsnewcentroidbasedonwhichpointsbelongtoit

Page 13: What is Learning? Machine Learning: Introduction and Unsupervised

13

K-MeansClustering•  Eachcluster

computesitsnewcentroid,basedonwhichpointsbelongtoit

•  Repeatun2lconvergence(i.e.,noclustercentermoves)

K-MeansDemo

•  h_p://home.dei.polimi.it/ma_eucc/Clustering/tutorial_html/AppletKM.html

K-MeansAlgorithm

•  Input:x1,…,xn,kwhereeachxiisapointinad-dimensionalfeaturespace

•  Step1:Selectkclustercenters,c1,…,ck•  Step2:Foreachpointxi,determineitscluster:Findtheclosestcenter(using,say,Euclideandistance)

•  Step3:Updateallclustercentersasthecentroids

•  Repeatsteps2and3un2lclustercentersnolongerchange

ci =1

num_ pts_ in_ cluster _ ix

x∈ cluster i∑ Input image Clusters on intensity Clusters on color

Example:ImageSegmenta2on

Page 14: What is Learning? Machine Learning: Introduction and Unsupervised

14

K-MeansProper2es

• Willitalwaysterminate?– Yes(finitenumberofwaysofpar22oningafinitenumberofpointsintokgroups)

•  Isitguaranteedtofindan“op2mal”clustering?– No,buteachitera2onwillreducethedistor2on(error)oftheclustering

Copyright © 2001, 2004, Andrew W. Moore

Non-Op2malClustering

Sayk=3andyouaregiventhefollowingpoints:

Copyright © 2001, 2004, Andrew W. Moore

Non-Op2malClustering

Givenapoorchoiceoftheini2alclustercenters,thefollowingresultispossible:

PickingStar2ngClusterCenters

Whichlocalop2mumk-Meansgoestoisdeterminedsolelybythestar2ngclustercenters

–  Idea1:Runk-Meansmul2ple2meswithdifferentstar2ng,randomclustercenters(hillclimbingwithrandomrestarts)

–  Idea2:Pickarandompointx1fromthedataset1.  Findthepointx2farthestfromx1inthe

dataset2.  Findx3farthestfromthecloserofx1,x23. …Pickkpointslikethis,andusethemasthe

star2ngclustercentersforthekclusters

Page 15: What is Learning? Machine Learning: Introduction and Unsupervised

15

PickingtheNumberofClusters

•  Difficultproblem•  Heuris2capproachesdependonthenumberofpointsandthenumberofdimensions

MeasuringClusterQuality

•  Distor:on=Sumofsquareddistancesofeachdatapointtoitsclustercenter:

•  The“op2mal”clusteringistheonethatminimizesdistor2on(overallpossibleclustercenterloca2onsandassignmentofpointstoclusters)

HowtoPickk?Trymul2plevaluesofkandpicktheoneatthe“elbow”ofthedistor2oncurve

Distor2o

n

NumberofClusters,k

UsesofK-Means

•  O[enusedasanexploratorydataanalysistool•  Inone-dimension,agoodwaytoquan2zereal-valuedvariablesintoknon-uniformbuckets

•  Usedonacous2cdatainspeechrecogni2ontoconvertwaveformsintooneofkcategories(knownasVectorQuan:za:on)

•  Alsousedforchoosingcolorpale_esongraphicaldisplaydevices

Page 16: What is Learning? Machine Learning: Introduction and Unsupervised

16

ThreeFrequentlyUsedClusteringMethods

•  HierarchicalAgglomera:veClustering– Buildabinarytreeoverthedataset

•  K-MeansClustering– Specifythedesirednumberofclustersanduseanitera2vealgorithmtofindthem

•  MeanShiDClustering

MeanShi[Clustering1.  Choose a search window size 2.  Choose the initial location of the search window 3.  Compute the mean location (centroid of the data) in the search

window 4.  Center the search window at the mean location computed in

Step 3 5.  Repeat Steps 3 and 4 until convergence

The mean shift algorithm seeks the mode, i.e., point of highest density of a data distribution:

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

Page 17: What is Learning? Machine Learning: Introduction and Unsupervised

17

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector Objective : Find the densest region

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Mean Shift vector

Objective : Find the densest region

Page 18: What is Learning? Machine Learning: Introduction and Unsupervised

18

Intui2veDescrip2on

Distribution of identical points

Region of interest

Centroid

Objective : Find the densest region

Results

feature space is only gray level

Results Results

Page 19: What is Learning? Machine Learning: Introduction and Unsupervised

19

SupervisedLearning

•  Alabeledtrainingsampleisacollec2onofexamples:(x1,y1),...,(xn,yn)

•  Assume(xi,yi)∼P(x,y)andP(x,y)isunknown•  Supervisedlearninglearnsafunc2onh:x→yinsomefunc2onfamily,H,suchthath(x)predictsthetruelabelyonfuturedata,x,where (x,y)∼P(x,y)

–  Classifica2on:ifydiscrete–  Regression:ifycon2nuous

i.i.d.

i.i.d.

Labels•  Examples

– Predictgender(M,F)fromweight,height– Predictadult,juvenile(A,J)fromweight,height

•  Alabelyisthedesiredpredic2onforaninstancex

•  Discretelabel:classes– M,F;A,J:o[enencodeas0,1or-1,1– Mul2pleclasses:1,2,3,…,C.Noclassorderimplied.

•  Con2nuouslabel:e.g.,bloodpressure

ConceptLearning

•  Determineifagivenexampleisorisnotaninstanceoftheconcept/class/category– Ifitis,callitaposi:veexample– Ifnot,calleditanega:veexample

Example:MushroomClassifica2on

http://www.usask.ca/biology/fungi/

Edible or Poisonous?

Page 20: What is Learning? Machine Learning: Introduction and Unsupervised

20

MushroomFeatures/A_ributes1.   cap-shape:bell=b,conical=c,convex=x,flat=f,knobbed=k,

sunken=s2.   cap-surface:fibrous=f,grooves=g,scaly=y,smooth=s3.   cap-color:brown=n,buff=b,cinnamon=c,gray=g,green=r,

pink=p,purple=u,red=e,white=w,yellow=y4.   bruises?:bruises=t,no=f5.   odor:almond=a,anise=l,creosote=c,fishy=y,foul=f,

musty=m,none=n,pungent=p,spicy=s6.   gill-a2achment:a_ached=a,descending=d,free=f,

notched=n7.  …

Classes:edible=e,poisonous=p

SupervisedConceptLearningbyInduc2on

•  Givenatrainingsetofposi2veandnega2veexamplesofaconcept:–  {(x1, y1), (x2, y2), ..., (xn, yn)}

whereeachyi iseither+or−•  Constructadescrip2onthataccuratelyclassifieswhetherfutureexamplesareposi2veornega2ve:– h(xn+1) = yn+1

whereyn+1 isthe+or−predic2on

SupervisedLearningMethods

•  k-nearest-neighbors(k-NN)(Chapter18.8.1)

•  Decisiontrees•  Neuralnetworks(NN)•  Supportvectormachines(SVM)•  etc.

Induc2veLearningbyNearest-NeighborClassifica2on

Asimpleapproach:– saveeachtrainingexampleasapointinFeatureSpace

– classifyanewexamplebygivingitthesameclassifica2onasitsnearestneighborinFeatureSpace

Page 21: What is Learning? Machine Learning: Introduction and Unsupervised

21

k-Nearest-Neighbors(k-NN)

•  1-NN: Decision boundary

k-NN

•  Whatifwewantregression?– Insteadofmajorityvote,takeaverageofneighbors’yvalues

•  Howtopickk?– Splitdataintotrainingandtuningsets– Classifytuningsetwithdifferentvaluesofk– Pickthekthatproducesthesmallesttuning-seterror

k-NNDoesn'tgeneralizewelliftheexamplesineachclassarenotwell"clustered"

Suit

Rank

Spades Clubs Hearts Diamonds

2 4 6 8 10 J Q K

k-NNDemo

•  h_p://www.cs.cmu.edu/~zhuxj/courseproject/knndemo/KNN.html

Page 22: What is Learning? Machine Learning: Introduction and Unsupervised

22

Induc2veBias

•  Induc2velearningisaninherentlyconjecturalprocess.Why?– anyknowledgecreatedbygeneraliza2onfromspecificfactscannotbeproventrue

–  itcanonlybeprovenfalse

•  Hence,induc2veinferenceis“falsitypreserving,”not“truthpreserving”

Induc2veBias

•  LearningcanbeviewedassearchingtheHypothesisSpaceHofpossiblehfunc2ons

•  Induc2veBias–  isusedwhenonehischosenoveranother–  isneededtogeneralizebeyondthespecifictrainingexamples

•  Completelyunbiasedinduc2vealgorithm– onlymemorizestrainingexamples– can'tpredictanythingaboutunseenexamples

Induc2veBias

Biasescommonlyusedinmachinelearning:– RestrictedHypothesisSpaceBias:allowonlycertaintypesofh’s,notarbitraryones

– PreferenceBias:defineametricforcomparingh’ssoastodeterminewhetheroneisbe_erthananother

SupervisedLearningMethods

•  k-nearest-neighbor(k-NN)•  Decisiontrees•  Neuralnetworks(NN)•  Supportvectormachines(SVM)•  etc.