UVA CS 4501: Machine Learning Lecture 22: Review...(3) Locally Weighted / Kernel Regression...

79
UVA CS 4501: Machine Learning Lecture 22: Review Dr. Yanjun Qi University of Virginia Department of Computer Science 12/5/18 Dr. Yanjun Qi / UVA CS 1

Transcript of UVA CS 4501: Machine Learning Lecture 22: Review...(3) Locally Weighted / Kernel Regression...

  • UVACS4501:MachineLearning

    Lecture22:Review

    Dr.YanjunQi

    UniversityofVirginia

    DepartmentofComputerScience

    12/5/18

    Dr.YanjunQi/UVACS

    1

  • Objective

    •  Tohelpstudentsbeabletobuildmachinelearningtools–  (notjustatooluser!!!)

    •  KeyResults:– Abletobuildafewsimplemachinelearningmethodsfromscratch

    – Abletounderstandafewcomplexmachinelearningmethodsatthesourcecodelevel

    12/5/18

    Dr.YanjunQi/UVACS

    2

  • Today

    q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory

    q ReviewofAssignmentscoveredsofar

    12/5/18 3

    Dr.YanjunQi/UVACS

  • ATypicalMachineLearningPipeline

    12/5/18

    Low-level sensing

    Pre-processing

    Feature Extract

    Feature Select

    Inference, Prediction, Recognition

    Label Collection

    4

    Evaluation

    Optimization

    e.g. Data Cleaning Task-relevant

    Dr.YanjunQi/UVACS

  • 12/5/18 5

    AnOperationalModelofMachineLearning

    Learner Reference Data

    Model

    Execution Engine

    Model Tagged Data

    Production Data Deployment

    Consistsofinput-outputpairs

    Dr.YanjunQi/UVACS

  • Machine Learning in a Nutshell

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    12/5/18 6

    MLgrewoutofworkinAIOptimizeaperformancecriterionusingexampledataorpastexperience,Aimingtogeneralizetounseendata

    Dr.YanjunQi/UVACS

  • Whatwehavecovered

    12/5/18

    Dr.YanjunQi/UVACS

    7

    Task Regression,classification,clustering,dimen-reduction

    Representation

    Linearfunc,nonlinearfunction(e.g.polynomialexpansion),locallinear,logisticfunction(e.g.p(c|x)),tree,multi-layer,prob-densityfamily(e.g.Bernoulli,multinomial,Gaussian,mixtureofGaussians),localfuncsmoothness,

    ScoreFunction

    MSE,Hinge,log-likelihood,EPE(e.g.L2lossforKNN,0-1lossforBayesclassifier),cross-entropy,clusterpointsdistancetocenters,variance,

    Search/Optimization

    Normalequation,gradientdescent,stochasticGD,Newton,Linearprogramming,Quadraticprogramming(quadraticobjectivewithlinearconstraints),greedy,EM,asyn-SGD,eigenDecomp

    Models,Parameters

    Regularization(e.g.L1,L2)

  • Scikit-learnalgorithmcheat-sheet

    12/5/18 8

    http://scikit-learn.org/stable/tutorial/machine_learning_map/Dr.YanjunQi/UVACS

  • 12/5/18 9

    Scikit-learn:Regression

    LinearmodelfittedbyminimizingaregularizedempiricallosswithSGD

    Dr.YanjunQi/UVACS

  • Scikit-learn:Classification

    12/5/18 10

    Linearclassifiers(SVM,logisticregression…)withSGDtraining.

    approximatetheexplicitfeaturemappingsthatcorrespondtocertainkernels

    Tocombinethepredictionsofseveralbaseestimatorsbuiltwithagivenlearningalgorithminordertoimprovegeneralizability/robustnessoverasingleestimator.(1)averaging/bagging(2)boosting

    Dr.YanjunQi/UVACS

  • 12/5/18 11

    BasicPCA

    Bayes-NetHMM

    Kmeans+GMM

    ThenafterclassificationDr.YanjunQi/UVACS

  • 12/5/18 12

    http://scikit-learn.org/stable/Dr.YanjunQi/UVACS

  • Today

    q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory

    q ReviewofAssignmentscoveredsofar

    12/5/18 13

    Dr.YanjunQi/UVACS

  • SUPERVISED LEARNING

    •  Find function to map input space X to output space Y

    •  Generalisation:learnfunction/hypothesisfrompastdatainorderto“explain”,“predict”,“model”or“control”newdataexamples

    12/5/18 14KEY

    Dr.YanjunQi/UVACS

  • Whatwehavecovered(I)q SupervisedRegressionmodels

    – Linearregression(LR)– LRwithnon-linearbasisfunctions– LocallyweightedLR– LRwithRegularizations–  Featureselection*

    12/5/18 15

    Dr.YanjunQi/UVACS

  • ADataset

    •  Data/points/instances/examples/samples/records:[rows]•  Features/attributes/dimensions/independentvariables/covariates/

    predictors/regressors:[columns,exceptthelast]•  Target/outcome/response/label/dependentvariable:special

    columntobepredicted[lastcolumn]

    12/5/18 16

    YiscontinuousOutput Y as

    continuous values

    Dr.YanjunQi/UVACS

  • (1) Multivariate Linear Regression

    Regression

    Y = Weighted linear sum of X’s

    Least-squares

    Linear algebra

    Regression coefficients

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    ŷ = f (x) =θ0 +θ1x1 +θ2x

    2

    12/5/18 17θ

    Dr.YanjunQi/UVACS

  • (2) Multivariate Linear Regression with basis Expansion

    Regression

    Y = Weighted linear sum of (X basis expansion)

    Least-squares

    Linear algebra

    Regression coefficients

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    ŷ =θ0 + θ jϕ j (x)j=1m

    ∑ =ϕ(x)θ12/5/18 18

    Dr.YanjunQi/UVACS

  • (3) Locally Weighted / Kernel Regression

    Regression

    Y = local weighted linear sum of X’s

    Least-squares

    Linear algebra

    Local Regression coefficients

    (conditioned on each test point)

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    0000 )(ˆ)(ˆ)(ˆ xxxxf βα +=min

    α (x0 ),β (x0 )Kλ (xi, x0 )[yi −α(x0 )−β(x0 )xi ]

    2

    i=1

    N

    ∑12/5/18 19

    Dr.YanjunQi/UVACS

  • (4) Regularized multivariate linear regression

    Regression

    Y = Weighted linear sum of X’s

    Least-squares+ Regularization -norm

    Normal Eq / SGD / GD

    Regularized Regression coefficients

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    min J(β) = Y −Y^"

    #$

    %&'2

    i=1

    n

    ∑ +λ β j2j=1

    p

    ∑12/5/18 20

    Dr.YanjunQi/UVACS

  • (5) Feature Selection

    Dimension Reduction

    n/a

    Many possible options

    Greedy (mostly)

    Reduced subset of features

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    12/5/18 21

    Dr.YanjunQi/UVACS

  • (5)FeatureSelection•  Thousandstomillionsoflowlevelfeatures:selectthemostrelevantonetobuildbetter,faster,andeasiertounderstandlearningmachines.

    X

    p

    n

    p’

    FromDr.IsabelleGuyon12/5/18 22

    Dr.YanjunQi/UVACS

  • Today

    q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory

    q ReviewofAssignmentscoveredsofar

    12/5/18 23

    Dr.YanjunQi/UVACS

  • Whatwehavecovered(II)q SupervisedClassificationmodels

    – K-nearestNeighbor– SupportVectorMachine– LogisticRegression– NeuralNetwork(e.g.MLP)– BayesClassifier– Randomforest/DecisionTree

    12/5/18 24

    Dr.YanjunQi/UVACS

  • Threemajorsectionsforclassification

    •  We can divide the large variety of classification approaches into roughly three major types

    1. Discriminative - directly estimate a decision rule/boundary - e.g., logistic regression, support vector machine, decisionTree 2. Generative: - build a generative statistical model - e.g., naïve bayes classifier, Bayesian networks 3. Instance based classifiers - Use observation directly (no models) - e.g. K nearest neighbors

    12/5/18 25

    Dr.YanjunQi/UVACS

  • ADatasetforclassification

    •  Data/points/instances/examples/samples/records:[rows]•  Features/attributes/dimensions/independentvariables/covariates/predictors/regressors:[columns,exceptthelast]•  Target/outcome/response/label/dependentvariable:specialcolumntobepredicted[lastcolumn]

    12/5/18 26

    Output as Discrete Class Label

    C1, C2, …, CL

    C

    CDr.YanjunQi/UVACS

  • (1) K-Nearest Neighbor

    classification

    Local Smoothness

    EPE with L2 loss è conditional mean (Extra)

    NA

    Training Samples

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    12/5/18 27

    Dr.YanjunQi/UVACS

  • (2) Support Vector Machine

    classification

    Kernel Func K(xi, xj)

    Margin + Hinge Loss (optional)

    QP with Dual form

    Dual Weights

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    w = α ixiyii∑

    argminw,b

    wi2

    i=1p∑ +C εi

    i=1

    n

    subject to ∀xi ∈ Dtrain : yi xi ⋅w+b( ) ≥1−εi

    K(x, z) :=Φ(x)TΦ(z)

    12/5/18 28

    Dr.YanjunQi/UVACS

  • (3) Logistic Regression

    classification

    Log-odds(Y) = linear function of X’s

    EPE, with conditional Log-likelihood

    Iterative (Newton) method

    Logistic weights

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    P(c =1 x) = eα+βx

    1+ eα+βx12/5/18 29

    Dr.YanjunQi/UVACS

  • (4) Neural Network

    conditional Log-likelihood , Cross-Entropy / MSE

    SGD / Backprop

    NN network Weights

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    12/5/18 30

    Classification / Regression

    Multilayer Network topology

    Dr.YanjunQi/UVACS

  • Logistic regression

    Logistic regression could be illustrated as a module

    On input x, it outputs ŷ:

    where

    Draw a logisticregression unit as:

    Σ

    x1

    x2

    x3

    +1

    ŷ=P(Y=1|X,Θ)wT !x + b

    11+ e− z

    Bias

    Summing function

    Activation function

    Output

    12/5/18 31

    Dr.YanjunQi/UVACS

  • Multi-LayerPerceptron(MLP)String a lot of logistic units together. Example: 3 layer network:

    x1

    x2

    x3

    +1 +1

    a3

    a2

    a1

    hiddeninput output

    y

    12/5/18 32

    Dr.YanjunQi/UVACS

  • Backpropagation

    ●  Back-propagationtrainingalgorithm

    Network activation Forward Step

    Error propagation Backward Step

    12/5/18 33

    Dr.YanjunQi/UVACS

  • DeepLearningWay:Learningfeatures/Representationfromdata

    Feature Engineering ü  Most critical for accuracy ü   Account for most of the computation for testing ü   Most time-consuming in development cycle ü   Often hand-craft and task dependent in practice

    Feature Learning ü  Easily adaptable to new similar tasks ü  Layerwise representation ü  Layer-by-layer unsupervised training ü  Layer-by-layer supervised training 3412/5/18

    Dr.YanjunQi/UVACS

  • (5) Bayes Classifier

    classification

    Prob. models p(X|C)

    EPE with 0-1 loss, with Log likelihood(optional)

    MLE

    Prob. Models’ Parameter

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    P(X1, ⋅ ⋅ ⋅,Xp |C)

    argmaxk

    P(C _ k | X) = argmaxk

    P(X,C) = argmaxk

    P(X |C)P(C)

    P̂(Xj |C = ck ) =1

    2πσ jkexp −

    (X j−µ jk )2

    2σ jk2

    "

    #$$

    %

    &''

    P(W1 = n1,...,Wv = nv | ck ) =N !

    n1k !n2k !..nvk !θ1kn1kθ2k

    n2k ..θvknvk

    p(Wi = true | ck ) = pi,kBernoulliNaïve

    GaussianNaive

    Multinomial12/5/18

    Dr.YanjunQi/UVACS

    35

  • Naïve Bayes Classifier

    Difficulty: learning the joint probability

    •  Naïve Bayes classification –  Assumption that all input attributes are conditionally independent!

    P(X1, ⋅ ⋅ ⋅,Xp |C)

    P(X1,X2, ⋅ ⋅ ⋅,Xp |C) = P(X1 | X2, ⋅ ⋅ ⋅,Xp,C)P(X2, ⋅ ⋅ ⋅,Xp |C) = P(X1 |C)P(X2, ⋅ ⋅ ⋅,Xp |C) = P(X1 |C)P(X2 |C) ⋅ ⋅ ⋅P(Xp |C)

    12/5/18

    AdaptfromProf.KeChenNBslides36

    C

    X1 X2 X5 X3 X4 X6

    Dr.YanjunQi/UVACS

  • (6) Decision Tree / Random Forest

    Greedy to find partitions

    Split Purity measure / e.g. IG / cross-entropy / Gini /

    Tree Model (s), i.e. space partition

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    12/5/18 37

    Classification

    Partition feature space into set of rectangles, local smoothness

    Dr.YanjunQi/UVACS

  • Anatomyofadecisiontree

    overcast

    high normal falsetrue

    sunny rain

    No NoYes Yes

    Yes

    Outlook

    HumidityWindy

    Eachnodeisatestononefeature/attribute

    Possibleattributevaluesofthenode

    Leavesarethedecisions

    12/5/18 38

    Dr.YanjunQi/UVACS

  • Informationgain

    •  IG(X_i,Y)=H(Y)-H(Y|X_i)ReductioninuncertaintybyknowingafeatureX_i

    Informationgain:=(informationbeforesplit)–(informationaftersplit)=entropy(parent)–[averageentropy(children)]

    Fixed thelower,thebetter(childrennodesarepurer)

    – ForIG,thehigher,thebetter=

    12/5/18 39

    Dr.YanjunQi/UVACS

  • RandomForestClassifierNexamples

    ....…

    ....…

    Takehemajorityvote

    Mfeatures

    12/5/18 40

    Dr.YanjunQi/UVACS

  • 12/5/18 41

    ü differentassumptionsondataü differentscalabilityprofilesattrainingtimeü differentlatenciesatprediction(test)timeü differentmodelsizes(embedabilityinmobiledevices)ü differentlevelofmodelinterpretability

    AdaptfromOlivierGrisel’stalk

    Dr.YanjunQi/UVACS

    https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html

  • Today

    q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory

    q ReviewofAssignmentscoveredsofar

    12/5/18 42

    Dr.YanjunQi/UVACS

  • Whatwehavecovered(III)

    q Unsupervisedmodels– DimensionReduction(PCA)– Hierarchicalclustering– K-meansclustering– GMM/EMclustering

    12/5/18 43

    Dr.YanjunQi/UVACS

  • AnunlabeledDatasetX

    •  Data/points/instances/examples/samples/records:[rows]•  Features/attributes/dimensions/independentvariables/covariates/predictors/regressors:[columns]12/5/18 44

    a data matrix of n observations on p variables x1,x2,…xp

    Unsupervisedlearning=learningfromraw(unlabeled,unannotated,etc)data,asopposedtosuperviseddatawherealabelofexamplesisgiven

    Dr.YanjunQi/UVACS

  • (0) Principal Component Analysis (Self Learn)

    Dimension Reduction

    Gaussian assumption

    Direction of maximum variance

    Eigen-decomp

    Principal components

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    12/5/18 45

    Dr.YanjunQi/UVACS

  • Whatwehavecovered(III)

    q Unsupervisedmodels– DimensionReduction(PCA)– Hierarchicalclustering– K-meansclustering– GMM/EMclustering

    12/5/18 46

    Dr.YanjunQi/UVACS

  • •  Find groups (clusters) of data points such that data points in a group will be similar (or related) to one another and different from (or unrelated to) the data points in other groups

    Whatisclustering?

    Inter-cluster distances are maximized

    Intra-cluster distances are

    minimized

    12/5/18 47

    Dr.YanjunQi/UVACS

  • 12/5/18

    Issuesforclustering•  Whatisanaturalgroupingamongtheseobjects?

    –  Definitionof"groupness"•  Whatmakesobjects“related”?

    –  Definitionof"similarity/distance"•  Representationforobjects

    –  Vectorspace?Normalization?•  Howmanyclusters?

    –  Fixedapriori?–  Completelydatadriven?

    •  Avoid“trivial”clusters-toolargeorsmall•  ClusteringAlgorithms

    –  Partitionalalgorithms–  Hierarchicalalgorithms

    •  Formalfoundationandconvergence48

    Dr.YanjunQi/UVACS

  • 12/5/18

    ClusteringAlgorithms

    •  Partitionalalgorithms– Usuallystartwitharandom(partial)partitioning

    –  Refineititeratively•  Kmeansclustering•  Mixture-Modelbasedclustering

    •  Hierarchicalalgorithms–  Bottom-up,agglomerative–  Top-down,divisive

    49

    Dr.YanjunQi/UVACS

  • (1) Hierarchical Clustering

    Clustering

    Distance measure

    No clearly defined loss

    greedy bottom-up (or top-down)

    Dendrogram (tree)

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    12/5/18 50

    Dr.YanjunQi/UVACS

  • Example: single link

    ⎥⎥⎥

    ⎢⎢⎢

    04507

    0

    54)3,2,1(

    54)3,2,1(

    12 3 4

    5

    ⎥⎥⎥⎥

    ⎢⎢⎢⎢

    0458079

    030

    543)2,1(

    543)2,1(

    ⎥⎥⎥⎥⎥⎥

    ⎢⎢⎢⎢⎢⎢

    0458907910

    03602

    0

    54321

    54321

    5},min{ 5),3,2,1(4),3,2,1()5,4(),3,2,1( == ddd

    12/5/18 51

    Dr.YanjunQi/UVACS

  • (2) K-means Clustering

    Clustering

    Spherical clusters

    Sum-of-square distance to centroid

    K-means (special case of EM) algorithm

    Cluster membership &

    centroid

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    12/5/18 52

    Dr.YanjunQi/UVACS

  • 12/5/18

    53

    K-meansClustering:Step2-Determinethemembershipofeachdatapoints

    Dr.YanjunQi/UVACS

  • (3) GMM Clustering

    Clustering

    Likelihood

    EM algorithm

    Each point’s soft membership &

    mean / covariance per cluster

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    12/5/18 54

    MixtureofGaussians

    Dr.YanjunQi/UVACS

    log p(x = xi )i=1

    n

    ∏i∑ = log p(µ = µ j )

    1

    2π( )p/2 Σ j1/2

    e−1

    2!x−!µ j( )T Σ j−1 !x− !µ j( )

    µ j∑⎡

    ⎢⎢⎢

    ⎥⎥⎥i

  • 12/5/18

    55

    Expectation-MaximizationfortrainingGMM

    •  Start:– "Guess"thecentroidmkandcovarianceSkofeachoftheKclusters

    •  Loop each cluster, revising both the mean (centroid position) and covariance (shape)

    Dr.YanjunQi/UVACS

    For each point, revising its proportions belonging to each of the K clusters

  • Dr.YanjunQi/UVACS

    56

    Compare:K-means

    •  TheEMalgorithmformixturesofGaussiansislikea"softversion"oftheK-meansalgorithm.

    •  IntheK-means“E-step”wedohardassignment:

    •  IntheK-means“M-step”weupdatethemeansastheweightedsumofthedata,butnowtheweightsare0or1:

    12/5/18

  • 12/5/18 57

    ü differentassumptionsondataü differentscalabilityprofilesü differentmodelsizes(embedabilityinmobiledevices)

    Dr.YanjunQi/UVACShttps://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html

  • Today

    q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory

    q ReviewofAssignmentscoveredsofar

    12/5/18 58

    Dr.YanjunQi/UVACS

  • Whatwehavecovered(IV)

    q Learningtheory/Modelselection– K-foldscrossvalidation– Expectedpredictionerror– Biasandvariancetradeoff

    12/5/18 59

    Dr.YanjunQi/UVACS

  • CV-basedModelSelectionWe’retryingtodecidewhichalgorithm/

    hyperparametertouse.•  Wetraineachmodelandmakeatable...

    12/5/18

    Dr.YanjunQi/UVACS

    60

    Hyperparametertuning….

  • StatisticalDecisionTheory(Extra)

    •  Random input vector: X •  Random output variable:Y •  Joint distribution: Pr(X,Y ) •  Loss function L(Y, f(X))

    •  Expected prediction error (EPE): • 

    EPE( f ) = E(L(Y, f (X))) = L(y, f (x))∫ Pr(dx,dy) e.g. = (y − f (x))2∫ Pr(dx,dy)

    Consider population distribution

    e.g. Squared error loss (also called L2 loss )

    12/5/18 61

    Dr.YanjunQi/UVACS

  • Bias-VarianceTrade-offforEPE: (Extra)

    EPE = noise2 + bias2 + variance

    Unavoidable error

    Error due to incorrect

    assumptions

    Error due to variance of training

    samples

    Slidecredit:D.Hoiem12/5/18 62

    Dr.YanjunQi/UVACS

  • Bias-Variance Tradeoff / Model Selection

    63

    underfit region overfit region

    12/5/18

    Dr.YanjunQi/UVACS

  • Model “bias” & Model “variance”

    •  Middle RED: –  TRUE function

    •  Error due to bias: –  How far off in general

    from the middle red

    •  Error due to variance: –  How wildly the blue

    points spread

    12/5/18 64

    Dr.YanjunQi/UVACS

  • needtomakeassumptionsthatareabletogeneralize

    •  Components of generalization error –  Bias: how much the average model over all training sets differ from

    the true model? •  Error due to inaccurate assumptions/simplifications made by the

    model –  Variance: how much models estimated from different training sets

    differ from each other •  Underfitting: model is too “simple” to represent all the

    relevant class characteristics –  High bias and low variance –  High training error and high test error

    •  Overfitting: model is too “complex” and fits irrelevant characteristics (noise) in the data –  Low bias and high variance –  Low training error and high test error

    Slidecredit:L.Lazebnik12/5/18 65

    Dr.YanjunQi/UVACS

  • Task

    Machine Learning in a Nutshell

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

    12/5/18 66

    MLgrewoutofworkinAIOptimizeaperformancecriterionusingexampledataorpastexperience,Aimingtogeneralizetounseendata

    Dr.YanjunQi/UVACS

  • Whatwehavecoveredforeachcomponent

    12/5/18

    Dr.YanjunQi/UVACS

    67

    Task Regression,classification,clustering,dimen-reduction

    Representation

    Linearfunc,nonlinearfunction(e.g.polynomialexpansion),locallinear,logisticfunction(e.g.p(c|x)),tree,multi-layer,prob-densityfamily(e.g.Bernoulli,multinomial,Gaussian,mixtureofGaussians),localfuncsmoothness,kernelmatrix,localsmoothness,partitionoffeaturespace,

    ScoreFunction

    MSE,Margin,log-likelihood,EPE(e.g.L2lossforKNN,0-1lossforBayesclassifier),cross-entropy,clusterpointsdistancetocenters,variance,conditionallog-likelihood,completedata-likelihood,regularizedlossfunc(e.g.L1,L2),goodnessofinter-clustersimilar

    Search/Optimization

    Normalequation,gradientdescent,stochasticGD,Newton,Linearprogramming,Quadraticprogramming(quadraticobjectivewithlinearconstraints),greedy,EM,asyn-SGD,eigenDecomp,backprop

    Models,Parameters

    Linearweightvector,basisweightvector,localweightvector,dualweights,trainingsamples,tree-dendrogram,multi-layerweights,principlecomponents,member(soft/hard)assignment,clustercentroid,clustercovariance(shape),…

  • Today

    q ReviewofMLmethodscoveredsofarq Regression(supervised)q Classification(supervised)q Unsupervisedmodelsq Learningtheory

    q ReviewofAssignmentscoveredsofar

    12/5/18 68

    Dr.YanjunQi/UVACS

  • HW1

    •  Q1:Linearalgebrareview•  Q2:Linearregressionmodelfitting– Dataloading– Basiclinearregression– Threewaystotrain:Normalequation/SGD/BatchGD

    •  SampleexamQ:–  regressionmodelfitting

    12/5/18

    Dr.YanjunQi/UVACS6316/f16

    69

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

  • HW2

    •  Q1:Polynomialregression•  Q2:Ridgeregression

    – Mathderivationofridge– Understandwhy/howRidge– ModelselectionofRidgewithKCV

    •  SampleQs:– Regularization

    12/5/18

    Dr.YanjunQi/UVACS6316/f16

    70

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

  • HW3

    •  Q1:KNNandmodelselection•  Q2:SupportVectorMachineswithScikit-Learn– Datapreprocessing– HowtouseSVMpackage– ModelselectionforSVM– Modelselectionpipelinewithtrain-vali,ortrain-CV;thentest

    •  SampleQAs:–  SVM

    12/5/18

    Dr.YanjunQi/UVACS6316/f16

    71

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

  • HW4•  Q1:NeuralNetworkTensorflowPlayground–  InteractivelearningofMLP–  Featureengineeringvs.–  Featurelearning

    •  Q2:ImageClassification/Keras– DNNTool:Kerasusing–  Extra:PCAforimageclassification

    •  SampleQs:– NeuralNets

    12/5/18

    Dr.YanjunQi/UVACS6316/f16

    72

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

  • HW5•  Q1:NaiveBayesClassifierforText-baseMovieReviewClassification– Preprocessingoftextsamples– BOWDocumentRepresentation– MultinomialNaiveBayesClassifierBOWway

    – MultivariateBernoulliNaiveBayesClassifier

    •  SampleQs:– BayesClassifier12/5/18

    Dr.YanjunQi/UVACS6316/f16

    73

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

  • HW6

    •  Q1:UnsupervisedClusteringofaudiodataandconsensusdata– Dataloading– K-meanclustering– HowtofindK:knee-findingplot– Howtomeasureclustering:purityMetric

    •  SampleQs:– KmeansandGMM– Decisiontrees

    12/5/18

    Dr.YanjunQi/UVACS6316/f16

    74

    Task

    Representation

    Score Function

    Search/Optimization

    Models, Parameters

  • HighlyRecommendFourExtra-curriculumbooks

    •  1.Book-AlgorithmstoLiveBy:TheComputerScienceofHumanDecisions– https://books.google.com/books/about/Algorithms_to_Live_By_The_Computer_Scien.html?id=xmeJCgAAQBAJ&source=kp_book_description

    – Thisbookprovidesafascinatingexplorationofhowcomputeralgorithmscanbeappliedtooureverydaylives.

    12/5/18

    Dr.YanjunQi/UVACS

    75

  • HighlyRecommendFourExtra-curriculumbooks

    •  2.AMindforNumbers:HowtoExcelatMathandScience(EvenIfYouFlunkedAlgebra)–  https://www.goodreads.com/book/show/18693655-a-mind-for-numbers

    – Agoodsummaryhttps://fastertomaster.com/a-mind-for-numbers-barbara-oakley/

    –  Thebookisgearedtowardhelpingstudentsstudywell.Despitethetitleformath,mostoftheadviceinthisbookisappropriateforjustaboutanysubject.

    12/5/18

    Dr.YanjunQi/UVACS

    76

  • HighlyRecommendFourExtra-curriculumbooks

    •  3.Book:SoGoodTheyCannotIgnoreYou-WhySkillsTrumpPassionintheQuestforWorkYouLove– https://www.amazon.com/Good-They-Cant-Ignore-You-ebook/dp/B0076DDBJ6/ref=tmm_kin_swatch_0?_encoding=UTF8&qid=1497747881&sr=1-1\

    – TheideaofCareercapital-rareandvaluableskillsneeddeliberatepractice

    – 10,000hoursofdeliberatepracticeèExpert!

    12/5/18

    Dr.YanjunQi/UVACS

    77

  • HighlyRecommendFourExtra-curriculumbooks

    •  4.Book:HomoDeus-ABriefHistoryofTomorrow–  https://www.goodreads.com/book/show/31138556-homo-deus

    –  “HomoDeusexplorestheprojects,dreamsandnightmaresthatwillshapethetwenty-firstcentury—fromovercomingdeathtocreatingartificiallife.Itasksthefundamentalquestions:Wheredowegofromhere?Andhowwillweprotectthisfragileworldfromourowndestructivepowers?Thisisthenextstageofevolution.ThisisHomoDeus.””

    12/5/18

    Dr.YanjunQi/UVACS

    78

  • References

    q Hastie,Trevor,etal.Theelementsofstatisticallearning.Vol.2.No.1.NewYork:Springer,2009.

    q Prof.M.A.Papalaskar’sslidesq Prof.AndrewNg’sslides

    12/5/18 79

    Dr.YanjunQi/UVACS