Unit 3 Multiple Linear Regression
Transcript of Unit 3 Multiple Linear Regression
NYUWIRELESS
Unit 3 Multiple Linear RegressionEL-GY 6143/CS-GY 6923: INTRODUCTION TO MACHINE LEARNING
PROF. PEI LIU
1
NYUWIRELESS
Learning ObjectivesqFormulate a machine learning model as a multiple linear regression model.
โฆ Identify prediction vector and target for the problem.
qWrite the regression model in matrix form. Write the feature matrix
qCompute the least-squares solution for the regression coefficients on training data.
qDerive the least-squares formula from minimization of the RSS
qManipulate 2D arrays in python (indexing, stacking, computing shapes, โฆ)
qCompute the LS solution using python linear algebra and machine learning packages
2
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
3
NYUWIRELESS
Example: Blood Glucose LevelqDiabetes patients must monitor glucose level
qWhat causes blood glucose levels to rise and fall?
qMany factors
qWe know mechanisms qualitatively
qBut, quantitative models are difficult to obtainโฆ Hard to derive from first principlesโฆ Difficult to model physiological process precisely
qCan machine learning help?
4
NYUWIRELESS
Data from AIM 94 Experiment
qData collected as series of eventsโฆ Eatingโฆ Exerciseโฆ Insulin dosage
qTarget variable glucose level monitored
5
NYUWIRELESS
Demo on GitHubqAll code is available in github:https://github.com/pliugithub/MachineLearning/blob/master/unit03_mult_lin_reg/demo_glucose.ipynb
6
NYUWIRELESS
Using Google ColaboratoryqTwo options for running the demos:
qOption 1: โฆ Clone the github repository to your local machine and run it using jupyter notebookโฆ Need to install all the software correctly
qOption 2: Run on the cloud in Google Colaboratory
qFor Option 2:โฆ Go to https://colab.research.google.com/โฆ File->Open Notebookโฆ Select GitHub tabโฆ Enter github URL:
https://github.com/sdrangan/intromlโฆ Select the unit03_mult_lin_reg/demo1_glucose.ipynb
7
NYUWIRELESS
Demo on Google Colab
8
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
9
NYUWIRELESS
Simple vs. Multiple RegressionqSimple linear regression: One predictor (feature)
โฆ Scalar predictor ๐ฅโฆ Linear model: "๐ฆ = ๐ฝ! + ๐ฝ"๐ฅโฆ Can only account for one variable
qMultiple linear regression: Multiple predictors (features)โฆ Vector predictor ๐ = (๐ฅ" , โฆ , ๐ฅ#)โฆ Linear model: "๐ฆ = ๐ฝ! + ๐ฝ"๐ฅ" + โฏ+ ๐ฝ#๐ฅ#โฆ Can account for multiple predictorsโฆ Turns into simple linear regression when ๐ = 1
10
NYUWIRELESS
Comparison to Single Variable ModelsqWe could compute models for each variable separately:
๐ฆ = ๐" + ๐"๐ฅ"๐ฆ = ๐$ + ๐$๐ฅ$โฎ
qBut, doesnโt provide a way to account for joint effects
qExample: Consider three linear models to predicting longevity:โฆ A: Longevity vs. some factor in diet (e.g. amount of fiber consumed)โฆ B: Longevity vs. exerciseโฆ C: Longevity vs. diet AND exerciseโฆ What does C tell you that A and B do not?
11
NYUWIRELESS
Special Case: Single VariableqSuppose ๐ = 1 predictor.
qFeature matrix and coefficient vector:
๐ด =1 ๐ฅ!โฎ โฎ1 ๐ฅ"
, ๐ฝ = ๐ฝ#๐ฝ!
qLS soln: ๐ฝ = !$๐ด%๐ด
&! !$๐ด%๐ฆ = ๐&!๐
๐ = 1 ๏ฟฝฬ ๏ฟฝ๏ฟฝฬ ๏ฟฝ ๐ฅ' , ๐ = -๐ฆ
๐ฅ๐ฆqObtain single variable solutions for coefficients (after some algebra):
๐ฝ! =๐ ()๐ ('
, ๐ฝ# = -๐ฆ โ ๐ฝ!๏ฟฝฬ ๏ฟฝ, ๐ ' =๐ ()'
๐ ('๐ )'
12
NYUWIRELESS
Loading the Data
qscikit-learn package:โฆ Many methods for machine learningโฆ Datasetsโฆ Will use throughout this class
qDiabetes dataset is one example
13
NYUWIRELESS
Simple Linear Regression for Diabetes DataqTry a fit of each variable individually
qCompute ๐ *' coefficient for each variable
qUse formula on previous slide
qโBestโ individual variable is a poor fitโฆ ๐ #$ โ 0.34
14
Best individual variable
NYUWIRELESS
Scatter PlotqNo one variable explains glucose well
qMultiple linear regression could be much better
15
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
16
NYUWIRELESS
Finding a Mathematical Model
17
qGoal: Find a function to predict glucose level from the 10 attributes
qProblem: Several attributes โฆ Need a multi-variable function
Attributes Target๐ฆ =Glucose level
๐ฆ โ "๐ฆ = ๐(๐ฅ" , โฆ , ๐ฅ"!)
๐ฅ": Age๐ฅ$: Sex๐ฅ%: BMI๐ฅ&: BP๐ฅ': S1
โฎ๐ฅ"!: S6
NYUWIRELESS
Matrix Representation of DataqData is a matrix and a target vector
q๐ samples: โฆ One sample per row
q๐ features / attributes /predictors: โฆ One feature per column
qThis example:โฆ ๐ฆ( = blood glucose measurement of i-th sampleโฆ ๐ฅ(,* : j-th feature of i-th sampleโฆ ๐(
+ = [๐ฅ(,", ๐ฅ(,$,โฆ, ๐ฅ(,#]: feature or predictor vectorโฆ i-th sample contains ๐( ,๐ฆ(
18
๐ =๐ฅ22 โฏ ๐ฅ23โฎ โฑ โฎ๐ฅ42 โฏ ๐ฅ43
Attributes
Samples๐ฆ =๐ฆ2โฎ๐ฆ4
Target vector
NYUWIRELESS
In class exercise
19
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
20
NYUWIRELESS
Multivariable Linear Model for Glucose
21
qGoal: Find a function to predict glucose level from the 10 attributes
qLinear Model: Assume glucose is a linear function of the predictors:
glucose โ prediction = ๐ฝ# + ๐ฝ! Age +โฏ+ ๐ฝ+ ๐ต๐ + ๐ฝ, S1 +โฏ+ ๐ฝ!# S6
qGeneral form:
AttributesTarget๐ฆ =Glucose level
๐ฆ โ "๐ฆ = ๐(๐ฅ" , โฆ , ๐ฅ"!)
Age, Sex, BMI,BP,S1, โฆ, S6๐ = [๐ฅ" , โฆ , ๐ฅ"!]
10 Features
๐ฆ โ "๐ฆ = ๐ฝ! + ๐ฝ"๐ฅ" + โฏ+ ๐ฝ&๐ฅ& + ๐ฝ'๐ฅ' + โฏ+ ๐ฝ"!๐ฅ"!Target
Intercept
NYUWIRELESS
Multiple Variable Linear ModelqVector of features: ๐ = [๐ฅ!, โฆ , ๐ฅ*]
โฆ ๐ features (also known as predictors, independent variable, attributes, covariates, โฆ)
qSingle target variable ๐ฆโฆ What we want to predict
qLinear model: Make a prediction M๐ฆ
๐ฆ โ M๐ฆ = ๐ฝ# + ๐ฝ!๐ฅ! +โฏ+ ๐ฝ*๐ฅ*qData for training
โฆ Samples are (๐( , ๐ฆ(), i=1,2,โฆ,n. โฆ Each sample has a vector of features: ๐( = [๐ฅ(" , โฆ , ๐ฅ(#] and scalar target ๐ฆ(
qProblem: Learn the best coefficients ๐ท = [๐ฝ# , ๐ฝ!, โฆ , ๐ฝ*] from the training data
22
NYUWIRELESS
Example: Heart Rate IncreaseqLinear Model: HR increase โ ๐ฝ# + ๐ฝ! mins exercise + ๐ฝ' exercise intensity
qData:
23
Subjectnumber
HR before HR after Mins ontreadmill
Speed(min/km)
Days exercise/ week
123 60 90 1 5.2 3
456 80 110 2 4.1 1
789 70 130 5 3.5 2
โฎ โฎ โฎ โฎ โฎ โฎ
Measuring fitness of athletes
https://www.mercurynews.com/2017/10/29/4851089/
NYUWIRELESS
Why Use a Linear Model?qMany natural phenomena have linear relationship
qPredictor has small variationโฆ Suppose ๐ฆ = ๐ ๐ฅโฆ If variation of ๐ฅ is small around some value ๐ฅ!, then
๐ฆ โ ๐ ๐ฅ! + ๐, ๐ฅ! ๐ฅ โ ๐ฅ! = ๐ฝ! + ๐ฝ"๐ฅ,
๐ฝ! = ๐ ๐ฅ! โ ๐, ๐ฅ! ๐ฅ! , ๐ฝ" = ๐โฒ(๐ฅ!)
qSimple to compute
qEasy to interpret relationโฆ Coefficient ๐ฝ* indicates the importance of feature j for the target.
qAdvanced: Gaussian random variables: โฆ If two variables are jointly Gaussian, the optimal predictor of one from the other is linear predictor
24
NYUWIRELESS
Matrix ReviewqConsider
๐ด =1 23 45 6
, ๐ต = 2 03 2 , ๐ฅ = 2
3 ,
qCompute (computations on the board):โฆ Matrix vector multiply: ๐ด๐ฅโฆ Transpose: ๐ด+
โฆ Matrix multiply: ๐ด๐ตโฆ Solution to linear equations: Solve for ๐ข: ๐ฅ = ๐ต๐ขโฆ Matrix inverse: ๐ต-"
25
NYUWIRELESS
Slopes, Intercept and Inner ProductsqModel with coefficients ๐ท: M๐ฆ = ๐ฝ# + ๐ฝ!๐ฅ! +โฏ+ ๐ฝ*๐ฅ*qSometimes use weight bias version:
M๐ฆ = ๐ + ๐ค!๐ฅ! +โฏ+๐ค*๐ฅ*
โฆ ๐ = ๐ฝ! : Bias or interceptโฆ ๐ = ๐ท":# = [๐ฝ" , โฆ , ๐ฝ#]: Weights or slope vector
qCan write either with inner product:
M๐ฆ = ๐ฝ# + ๐ท!:* โ ๐ or M๐ฆ = ๐ + ๐ โ ๐qInner product:
โฆ ๐ โ ๐ = โ*/"# ๐ค*๐ฅ*
โฆ Will use alternate notation: ๐ฐ+๐ = ๐, ๐
26
NYUWIRELESS
Matrix Form of Linear RegressionqData: ๐., ๐ฆ. , ๐ = 1,โฆ , ๐
qPredicted value for ๐-th sample: M๐ฆ. = ๐ฝ# + ๐ฝ!๐ฅ.! +โฏ+ ๐ฝ*๐ฅ.*qMatrix form
M๐ฆ!M๐ฆ'โฎM๐ฆ"
=11
๐ฅ!!๐ฅ'!
โฏโฏ
๐ฅ!*๐ฅ'*
โฎ โฎ โฏ โฎ1 ๐ฅ"! โฏ ๐ฅ"*
๐ฝ#๐ฝ!โฎ๐ฝ*
qMatrix equation: ๏ฟฝฬ๏ฟฝ = ๐จ ๐ท
27
๐ท with ๐ = ๐ + 1 coefficient vector
๐จ a ๐ร(๐ + 1) feature matrix
K๐ a ๐ predicted values
NYUWIRELESS
In-Class Exercise
28
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
29
NYUWIRELESS
Least Squares Model FittingqHow do we select parameters ๐ท = (๐ฝ#, โฆ , ๐ฝ*)?
qDefine M๐ฆ. = ๐ฝ# + ๐ฝ!๐ฅ.! +โฏ+ ๐ฝ*๐ฅ.*โฆ Predicted value on sample ๐ for parameters ๐ท = (๐ฝ! , โฆ , ๐ฝ#)
qDefine average residual sum of squares:
RSS ๐ท :=e/0!
"
๐ฆ. โ M๐ฆ. '
โฆ Note that "๐ฆ( is implicitly a function of ๐ท = (๐ฝ! , โฆ , ๐ฝ#)โฆ Also called the sum of squared residuals (SSR) and sum of squared errors (SSE)
qLeast squares solution: Find ๐ท to minimize RSS.
30
NYUWIRELESS
Variants of RSSqOften use some variant of RSS
โฆ Note: these are not standard
qResidual sum of squares: RSS = โ.0!" ๐ฆ. โ M๐ฆ. '
qRSS per sample or Mean Squared Error:
MSE =RSS๐
=1๐e.0!
"
๐ฆ. โ M๐ฆ. '
qNormalized RSS or Normalized MSE: โ๐ ๐๐ ๐
๐ )'=๐๐๐ธ๐ )'
=โ.0!" ๐ฆ. โ M๐ฆ. '
โ.0!" ๐ฆ. โ -๐ฆ '
31
NYUWIRELESS
Finding Parameters via OptimizationA general ML recipe
qPick a model with parameters
qGet data
qPick a loss functionโฆ Measures goodness of fit model to dataโฆ Function of the parameters
qFind parameters that minimizes loss
32
Linear model: M๐ฆ = ๐ฝ# + ๐ฝ!๐ฅ! +โฏ+ ๐ฝ*๐ฅ*Data: ๐., ๐ฆ. , ๐ = 1,2, โฆ , ๐
Loss function: ๐ ๐๐ ๐ฝ#, โฆ , ๐ฝ* โ โ ๐ฆ. โ M๐ฆ. '
Select ๐ท = (๐ฝ#, โฆ , ๐ฝ*) to minimize ๐ ๐๐ ๐ท
General ML problem Multiple linear regression
NYUWIRELESS
RSS as a Vector NormqRSS is given by sum:
RSS =e.0!
"๐ฆ. โ M๐ฆ. '
qDefine norm of a vector: โฆ ๐ = ๐ฅ"$ + โฏ+ ๐ฅ0$ "/$
โฆ Standard Euclidean norm.โฆ Sometimes called โ-2 norm. โ is for Lebesque
qWrite RSS in vector form:RSS = ๐ โ ๏ฟฝฬ๏ฟฝ '
33
NYUWIRELESS
Least Squares SolutionqConsider cost function of the RSS:
RSS ๐ท = Q(/"
2๐ฆ( โ "๐ฆ( $ , "๐ฆ( =Q
*/!
3๐ด(*๐ฝ*
โฆ Vector ๐ท that minimizes RSS called the least-squares solution
qLeast squares solution: The vector ๐ท that minimizes the RSS is:
n๐ท = ๐จ%๐จ &!๐จ%๐
โฆ Can compute the best coefficient vector analyticallyโฆ Just solve a linear set of equations โฆ Will show the proof below
34
NYUWIRELESS
Proving the LS FormulaqLeast squares formula: The vector ๐ท that minimizes the RSS is:
n๐ท = ๐จ%๐จ &!๐จ%๐
qTo prove this formula, we will:
โฆ Review gradients of multi-variable functions
โฆ Compute gradient ๐ป๐ ๐๐ ๐ท
โฆ Solve ๐ป๐ ๐๐ ๐ท = 0
35
NYUWIRELESS
Gradients of Multi-Variable FunctionsqConsider scalar valued function of a vector: ๐ ๐ท = ๐(๐ฝ!, โฆ , ๐ฝ")
qGradient is the column vector:
๐ป๐ ๐ท =โ๐๐(๐ท) ๐๐ฝ!โฎโ๐๐(๐ท) ๐๐ฝ"
qRepresents direction of maximum increase
qAt a local minima or maxima: ๐ป๐ ๐ท = 0โฆ Solve ๐ equations and ๐ unknowns
qEx: ๐ ๐ฝ!, ๐ฝ' = ๐ฝ! sin ๐ฝ' + ๐ฝ!' ๐ฝ'. โฆ Compute ๐ป๐ ๐ท . Solution on board
36
NYUWIRELESS
Proof of the LS FormulaqConsider cost function of the RSS:
RSS = Q(/"
2๐ฆ( โ "๐ฆ( $ , "๐ฆ( =Q
*/!
3๐ด(*๐ฝ*
โฆ Vector ๐ท that minimizes RSS called the least-squares solution
qCompute partial derivatives via chain rule: 1233144
= โ2โ.0!" ๐ฆ. โ M๐ฆ. ๐ด.5, ๐ = 1,2, โฆ , ๐
qMatrix form: RSS = ๐ด๐ท โ ๐ ', ๐ป๐ ๐๐ = โ2๐ด%(๐ โ ๐จ๐ท)
qSolution: ๐ด% ๐ โ ๐ด๐ท = 0 โ ๐ท = ๐ด%๐ด &!๐ด%๐ (least squares solution of equation ๐ด๐ท = ๐ฒ)
qMinimum RSS: ๐ ๐๐ = ๐% ๐ผ โ ๐ด ๐ด%๐ด &!๐ด% ๐โฆ Proof on the board
37
NYUWIRELESS
LS Solution via Auto-Correlation FunctionsqEach data sample has a linear feature vector:
๐ด. = ๐ด.#, โฏ , ๐ด.* = 1, ๐ฅ.!, โฏ , ๐ฅ.*
qDefine sample auto-correlation matrix and cross-correlation vector:โฆ ๐ 55 =
"2๐ด+๐ด, ๐ 55 โ,๐ = "
2โ(/"2 ๐ด(โ๐ด(7 (correlation of feature โ and feature m)
โฆ ๐ 58 ="2๐ด+๐ฆ, ๐ 85 โ = "
2โ(/"2 ๐ด(โ๐ฆ( (correlation of feature โ and target)
qLeast squares solution is: ๐ท = ๐ 66&!๐ 6)
38
NYUWIRELESS
Mean Removed Form of the LS SolutionqOften useful to remove mean from data before fitting
qSample mean: -๐ฆ = !$โ.0!$ ๐ฆ., ๏ฟฝฬ ๏ฟฝ5 =
!$โ.0!$ ๐ฅ.5, v๐ = ๏ฟฝฬ ๏ฟฝ!, โฏ , ๏ฟฝฬ ๏ฟฝ*
qDefined mean removed data: w๐.5 = ๐ฅ.5 โ ๏ฟฝฬ ๏ฟฝ5, y๐ฆ. = ๐ฆ. โ -๐ฆ
qSample covariance matrix and cross-covariance vector:โฆ ๐99 โ,๐ = "
:โ(/": (๐ฅ(โโ๏ฟฝฬ ๏ฟฝโ)(๐ฅ(7โ๏ฟฝฬ ๏ฟฝ7), ๐99 =
":V๐ฟ๐ปV๐ฟ
โฆ ๐98 โ = ":โ(/": (๐ฅ(โโ๏ฟฝฬ ๏ฟฝโ)(๐ฆ(โX๐ฆ), ๐98 =
":V๐ฟ+Y๐
qMean-Removed form of the least squares solution:
M๐ฆ = ๐ท!:* โ ๐ + ๐ฝ#, ๐ท!:* = ๐((&!๐(), ๐ฝ# = -๐ฆ โ ๐ท!:* โ v๐
qProof: On board
39
NYUWIRELESS
๐ !: Goodness of FitqMultiple variable coefficient of determination:
๐ " =๐ #" โ๐๐๐ธ
๐ #"= 1 โ
๐๐๐ธ๐ #"
โฆ MSE = <==2= "
2โ(/"2 ๐ฆ( โ "๐ฆ( $
โฆ Sample variance is: ๐ 8$ ="2โ(/"2 ๐ฆ( โ X๐ฆ $, X๐ฆ = "
2โ(/"2 ๐ฆ( ,
qInterpretation:
โฆ >=?@!"
= Error with linear predictorError predicting bymean
โฆ ๐ $ = fraction of variance reduced or โexplainedโ by the model.
qOn the training data (not necessarily on the test data):โฆ ๐ $ โ [0,1] alwaysโฆ ๐ $ โ 1 โ linear model provides a good fit โฆ ๐ $ โ 0 โ linear model provides a poor fit
40
NYUWIRELESS
In-Class Exercise
41
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing the solutions in python
qSpecial case: Simple linear regression
qExtensions
42
NYUWIRELESS
Arrays and Vector in Python and MATLAB
43
qThere are some key differences between MATLAB and Python that you need to get used to
qMATLABโฆ All arrays are at least 2 dimensionsโฆ Vectors are 1ร๐ (row vectors) or ๐ร1 (column) vectorsโฆ Matrix vector multiplication syntax depends if vector is on left or right: xโ*A or A*x
qPython:โฆ Arrays can have 1, 2, 3, โฆ dimensionโฆ Vectors can be 1D arrays; matrices are generally 2D arraysโฆ Vectors that are 1D arrays are neither row not column vectorsโฆ If x is 1D and A is 2D, then left and right multiplication are the same: x.dot(A) and A.dot(x)
qLecture notes: We will generally treat ๐ฅ and ๐ฅ% the same. โฆ Can write ๐ฅ = (๐ฅ" , โฆ , ๐ฅ:) and still multiply by a matrix on left or right
NYUWIRELESS
Fitting Using sklearnqReturn to diabetes data example
qAll code in demo
qDivide data into two portions:โฆ Training data: First 300 samplesโฆ Test data: Remaining 142 samples
qTrain model on training data.
qTest model (i.e. measure RSS) on test data
qReason for splitting data discussed next lecture.
44
NYUWIRELESS
Manually Computing the SolutionqUse numpy linear algebra routine to solve
๐ฝ = ๐ด%๐ด &!๐ด%๐ฆ
qCommon mistake:โฆ Compute matrix inverse ๐ = ๐ด+๐ด -", โฆ Then compute ๐ฝ = ๐๐ด+๐ฆโฆ Full matrix inverse is VERY slow. Not needed.โฆ Can directly solve linear system: ๐ด ๐ฝ = ๐ฆโฆ Numpy has routines to solve this directly
45
NYUWIRELESS
Calling the sklearn Linear Regression methodqConstruct a linear regression object
qRun it on the training data
qPredict values on the test data
46
NYUWIRELESS
In-Class Exercise
47
NYUWIRELESS
OutlineqMotivating Example: Understanding glucose levels in diabetes patients
qMultiple variable linear models
qLeast squares solutions
qComputing in python
qExtensions
48
NYUWIRELESS
Transformed Linear ModelsqStandard linear model: M๐ฆ = ๐ฝ# + ๐ฝ!๐ฅ! +โฏ+ ๐ฝ7๐ฅ7qLinear model may be too restrictive
โฆ Relation between ๐ฅ and ๐ฆ can be nonlinear
qUseful to look at models in transformed form:
M๐ฆ = ๐ฝ!๐! ๐ +โฏ+ ๐ฝ8๐8 ๐โฆ Each function ๐* ๐ = ๐*(๐ฅ" , โฆ , ๐ฅA) is called a basis functionโฆ Each basis function may be nonlinear and a function of multiple variables
qCan write in vector form: M๐ฆ = ๐ ๐ โ ๐ทโฆ ๐ ๐ = ๐" ๐ , โฆ , ๐3 ๐ , ๐ท = [๐ฝ" , โฆ , ๐ฝ3]
49
๐ฅ
๐ฆ
NYUWIRELESS
Fitting Transformed Linear ModelsqConsider transformed linear model
M๐ฆ = ๐ฝ!๐! ๐ +โฏ+ ๐ฝ8๐8 ๐
qWe can fit this model exactly as beforeโฆ Given data ๐( , ๐ฆ( , ๐ = 1, โฆ , ๐โฆ Want to fit the model from the transformed variables ๐* ๐ to target ๐ฆโฆ Define the transformed matrix:
๐ด =๐"(๐") โฏ ๐3(๐")
โฎ โฎ โฎ๐"(๐:) โฏ ๐3(๐:)
โฆ Predictions: "๐ฆ = ๐ด๐ฝโฆ Least squares fit t๐ฝ = ๐ด+๐ด -"๐ด+๐ฆ
50
NYUWIRELESS
Example: Polynomial FittingqSuppose ๐ฆ only depends on a single variable ๐ฅ,
qWant to fit a polynomial modelโฆ ๐ฆ โ ๐ฝ! + ๐ฝ"๐ฅ +โฏ๐ฝ#๐ฅ#
qGiven data ๐ฅ( , ๐ฆ( , ๐ = 1,โฆ , ๐
qTake basis functions ๐* ๐ฅ = ๐ฅ* , ๐ = 0,โฆ , ๐qTransformed model: "๐ฆ = ๐ฝ!๐! ๐ฅ +โฏ+๐ฝA๐A ๐ฅqTransformed matrix is:
๐จ =1 ๐ฅ" โฏ ๐ฅ"Aโฎ โฎ โฏ โฎ1 ๐ฅ2 โฏ ๐ฅ2A
, ๐ฝ =๐ฝ!โฎ๐ฝA
โฆ ๐ = ๐ + 1 transformed features from 1 original feature
qWill discuss how to select ๐ in the next lecture
51
NYUWIRELESS
Other Nonlinear ExamplesqMultinomial model: M๐ฆ = ๐ + ๐!๐ฅ! + ๐'๐ฅ' + ๐!๐ฅ!' + ๐'๐ฅ!๐ฅ' + ๐C๐ฅ''
โฆ Contains all second order termsโฆ Define parameter vector ๐ฝ = ๐, ๐" , ๐$ , ๐" , ๐$ , ๐%โฆ Transformed vector ๐ ๐ฅ" , ๐ฅ$ = [1, ๐ฅ" , ๐ฅ$ , ๐ฅ"$ , ๐ฅ"๐ฅ$ , ๐ฅ$$]โฆ Note that the features are nonlinear functions of ๐ = [๐ฅ" , ๐ฅ$]
qExponential model: M๐ฆ = ๐!e&DB( +โฏ+ ๐7๐&DC(โฆ If the parameters ๐" , โฆ , ๐A are fixed, then the model is linear in the parameters ๐" , โฆ , ๐Aโฆ Parameter vector ๐ฝ = ๐" , โฆ , ๐Aโฆ Transformed vector ๐ ๐ฅ = [๐-D#9 , โฆ , ๐-D$9]โฆ But, if the parameters ๐" , โฆ , ๐A are not fixed, the model is nonlinear in ๐" , โฆ , ๐A
52
NYUWIRELESS
Linear Models via Re-ParametrizationqSometimes models can be made into a linear model via re-parametrization
qExample: Consider the model: M๐ฆ = ๐ด๐ฅ!(1 + ๐ต๐&(E)โฆ Parameters (๐ด, ๐ต)
qThis is nonlinear in (๐ด, ๐ต) due to the product ๐ด๐ต: M๐ฆ = ๐ด๐ฅ! + ๐ด๐ต๐ฅ!๐&(E
qBut, we can define a new set of parameters:โฆ ๐ฝ" = ๐ด and ๐ฝ$ = ๐ด๐ต
qThen, M๐ฆ = ๐ฝ!๐ฅ! + ๐ฝ'๐ฅ!๐&(E
qBasis functions: ๐ ๐ฅ!, ๐ฅ' = [๐ฅ!, ๐ฅ!๐&(E]
qAfter we solve for ๐ฝ!, ๐ฝ' we can recover ๐ด, ๐ต via inverting the equations:
๐ด = ๐ฝ!, ๐ต =๐ฝ'๐ด
53
NYUWIRELESS
Example: Learning Linear SystemsqLinear system: ๐ฆ$ = ๐%๐ฆ$&% +โฏ+ ๐'๐ฆ$&' + ๐(๐ฅ$ +โฏ+ ๐)๐ฅ$&) +๐ค$
qTransfer function: ๐ป ๐ง = *$+โฏ+*%-&%
%&.'-&'&โฏ&.(-&(
qGiven input sequence and output sequence for T samples,
How do we determine ๐ฝ = ๐%, โฏ , ๐', ๐(, โฏ , ๐) /?
qCan be solved using linear regression!
qWrite ๐ฆ = ๐ด๐ฝ + ๐ค and define ๐ด, yโฆ See homework problem
qMany applicationsโฆ Learning dynamics in robots / mechanical systemsโฆ Modeling responses in neural systemsโฆ Stock market time seriesโฆ Speech modeling. Fit a model each 25 ms.
54
NYUWIRELESS
One Hot EncodingqSuppose that one feature ๐ฅ5 is a categorical variable
qEx: Predict the price of a car, ๐ฆ, given model ๐ฅ! and interior space ๐ฅ'โฆ Suppose there are 3 different models of a car (Ford, BMW, GM)โฆ Bad idea: Arbitrarily assign an index to each possible car modelโฆ Can give unreasonable relations
qOne-hot encoding example: โฆ With 3 possible categories, represent ๐ฅ" using 3 binary features (๐", ๐$ , ๐%)โฆ Model: ๐ฆ = ๐ฝ! + ๐ฝ"๐" + ๐ฝ$๐$ + ๐ฝ%๐% + ๐ฝ&๐ฅ$โฆ Essentially obtain 3 different models:
โฆ Ford: ๐ฆ = ๐ฝ% + ๐ฝ& + ๐ฝ'๐ฅ(โฆ BMW: ๐ฆ = ๐ฝ% + ๐ฝ( + ๐ฝ'๐ฅ(โฆ GM: ๐ฆ = ๐ฝ% + ๐ฝ) + ๐ฝ'๐ฅ(
โฆ Allows different intercepts (or mean values) for different categories!
55
Model ๐๐ ๐๐ ๐๐
Ford 1 0 0
BMW 0 1 0
GM 0 0 1
NYUWIRELESS
Lab: Robot CalibrationqPredict the current draw
โฆ Needed to predict power consumption
qPredictors:โฆ Joint angles, velocity and accelerationโฆ Strain gauge readings (measure of load)
qFull website at TU Dortmund, Germanyโฆ http://www.rst.e-technik.tu-
dortmund.de/cms/en/research/robotics/TUDOR_engl/index.html
โฆ http://www.rst.e-technik.tu-dortmund.de/forschung/robot-toolbox/MERIt/MERIt_Documentation.pdf
56