Unit 3 Multiple Linear Regression

NYUWIRELESS

Unit 3 Multiple Linear RegressionEL-GY 6143/CS-GY 6923: INTRODUCTION TO MACHINE LEARNING

PROF. PEI LIU

1

NYUWIRELESS

Learning ObjectivesqFormulate a machine learning model as a multiple linear regression model.

◦ Identify prediction vector and target for the problem.

qWrite the regression model in matrix form. Write the feature matrix

qCompute the least-squares solution for the regression coefficients on training data.

qDerive the least-squares formula from minimization of the RSS

qManipulate 2D arrays in python (indexing, stacking, computing shapes, …)

qCompute the LS solution using python linear algebra and machine learning packages

2

NYUWIRELESS

OutlineqMotivating Example: Understanding glucose levels in diabetes patients

qMultiple variable linear models

qLeast squares solutions

qComputing the solutions in python

qSpecial case: Simple linear regression

qExtensions

3

NYUWIRELESS

Example: Blood Glucose LevelqDiabetes patients must monitor glucose level

qWhat causes blood glucose levels to rise and fall?

qMany factors

qWe know mechanisms qualitatively

qBut, quantitative models are difficult to obtain◦ Hard to derive from first principles◦ Difficult to model physiological process precisely

qCan machine learning help?

4

NYUWIRELESS

Data from AIM 94 Experiment

qData collected as series of events◦ Eating◦ Exercise◦ Insulin dosage

qTarget variable glucose level monitored

5

NYUWIRELESS

Demo on GitHubqAll code is available in github:https://github.com/pliugithub/MachineLearning/blob/master/unit03_mult_lin_reg/demo_glucose.ipynb

6

https://github.com/pliugithub/MachineLearning/blob/master/unit03_mult_lin_reg/demo1_glucose.ipynb

NYUWIRELESS

Using Google ColaboratoryqTwo options for running the demos:

qOption 1: ◦ Clone the github repository to your local machine and run it using jupyter notebook◦ Need to install all the software correctly

qOption 2: Run on the cloud in Google Colaboratory

qFor Option 2:◦ Go to https://colab.research.google.com/◦ File->Open Notebook◦ Select GitHub tab◦ Enter github URL:

https://github.com/sdrangan/introml◦ Select the unit03_mult_lin_reg/demo1_glucose.ipynb

7

https://colab.research.google.com/

https://github.com/sdrangan/introml

NYUWIRELESS

Demo on Google Colab

8

NYUWIRELESS






qExtensions

9

NYUWIRELESS

Simple vs. Multiple RegressionqSimple linear regression: One predictor (feature)

◦ Scalar predictor 𝑥◦ Linear model: "𝑦 = 𝛽! + 𝛽"𝑥◦ Can only account for one variable

qMultiple linear regression: Multiple predictors (features)◦ Vector predictor 𝒙 = (𝑥" , … , 𝑥#)◦ Linear model: "𝑦 = 𝛽! + 𝛽"𝑥" + ⋯+ 𝛽#𝑥#◦ Can account for multiple predictors◦ Turns into simple linear regression when 𝑘 = 1

10

NYUWIRELESS

Comparison to Single Variable ModelsqWe could compute models for each variable separately:

𝑦 = 𝑎" + 𝑏"𝑥"𝑦 = 𝑎$ + 𝑏$𝑥$⋮

qBut, doesn’t provide a way to account for joint effects

qExample: Consider three linear models to predicting longevity:◦ A: Longevity vs. some factor in diet (e.g. amount of fiber consumed)◦ B: Longevity vs. exercise◦ C: Longevity vs. diet AND exercise◦ What does C tell you that A and B do not?

11

NYUWIRELESS

Special Case: Single VariableqSuppose 𝑘 = 1 predictor.

qFeature matrix and coefficient vector:

𝐴 =1 𝑥!⋮ ⋮1 𝑥"

, 𝛽 = 𝛽#𝛽!

qLS soln: 𝛽 = !$𝐴%𝐴

&! !$𝐴%𝑦 = 𝑃&!𝑟

𝑃 = 1 �̅��̅� 𝑥' , 𝑟 = -𝑦

𝑥𝑦qObtain single variable solutions for coefficients (after some algebra):

𝛽! =𝑠()𝑠('

, 𝛽# = -𝑦 − 𝛽!�̅�, 𝑅' =𝑠()'

𝑠('𝑠)'

12

NYUWIRELESS

Loading the Data

qscikit-learn package:◦ Many methods for machine learning◦ Datasets◦ Will use throughout this class

qDiabetes dataset is one example

13

NYUWIRELESS

Simple Linear Regression for Diabetes DataqTry a fit of each variable individually

qCompute 𝑅*' coefficient for each variable

qUse formula on previous slide

q“Best” individual variable is a poor fit◦ 𝑅#$ ≈ 0.34

14

Best individual variable

NYUWIRELESS

Scatter PlotqNo one variable explains glucose well

qMultiple linear regression could be much better

15

NYUWIRELESS






qExtensions

16

NYUWIRELESS

Finding a Mathematical Model

17

qGoal: Find a function to predict glucose level from the 10 attributes

qProblem: Several attributes ◦ Need a multi-variable function

Attributes Target𝑦 =Glucose level

𝑦 ≈ "𝑦 = 𝑓(𝑥" , … , 𝑥"!)

𝑥": Age𝑥$: Sex𝑥%: BMI𝑥&: BP𝑥': S1

⋮𝑥"!: S6

NYUWIRELESS

Matrix Representation of DataqData is a matrix and a target vector

q𝑛 samples: ◦ One sample per row

q𝑘 features / attributes /predictors: ◦ One feature per column

qThis example:◦ 𝑦( = blood glucose measurement of i-th sample◦ 𝑥(,* : j-th feature of i-th sample◦ 𝒙(

+ = [𝑥(,", 𝑥(,$,…, 𝑥(,#]: feature or predictor vector◦ i-th sample contains 𝒙( ,𝑦(

18

𝑋 =𝑥22 ⋯ 𝑥23⋮ ⋱ ⋮𝑥42 ⋯ 𝑥43

Attributes

Samples𝑦 =𝑦2⋮𝑦4

Target vector

NYUWIRELESS

In class exercise

19

NYUWIRELESS






qExtensions

20

NYUWIRELESS

Multivariable Linear Model for Glucose

21

qGoal: Find a function to predict glucose level from the 10 attributes

qLinear Model: Assume glucose is a linear function of the predictors:

glucose ≈ prediction = 𝛽# + 𝛽! Age +⋯+ 𝛽+ 𝐵𝑃 + 𝛽, S1 +⋯+ 𝛽!# S6

qGeneral form:

AttributesTarget𝑦 =Glucose level

𝑦 ≈ "𝑦 = 𝑓(𝑥" , … , 𝑥"!)

Age, Sex, BMI,BP,S1, …, S6𝒙 = [𝑥" , … , 𝑥"!]

10 Features

𝑦 ≈ "𝑦 = 𝛽! + 𝛽"𝑥" + ⋯+ 𝛽&𝑥& + 𝛽'𝑥' + ⋯+ 𝛽"!𝑥"!Target

Intercept

NYUWIRELESS

Multiple Variable Linear ModelqVector of features: 𝒙 = [𝑥!, … , 𝑥*]

◦ 𝑘 features (also known as predictors, independent variable, attributes, covariates, …)

qSingle target variable 𝑦◦ What we want to predict

qLinear model: Make a prediction M𝑦

𝑦 ≈ M𝑦 = 𝛽# + 𝛽!𝑥! +⋯+ 𝛽*𝑥*qData for training

◦ Samples are (𝒙( , 𝑦(), i=1,2,…,n. ◦ Each sample has a vector of features: 𝒙( = [𝑥(" , … , 𝑥(#] and scalar target 𝑦(

qProblem: Learn the best coefficients 𝜷 = [𝛽# , 𝛽!, … , 𝛽*] from the training data

22

NYUWIRELESS

Example: Heart Rate IncreaseqLinear Model: HR increase ≈ 𝛽# + 𝛽! mins exercise + 𝛽' exercise intensity

qData:

23

Subjectnumber

HR before HR after Mins ontreadmill

Speed(min/km)

Days exercise/ week

123 60 90 1 5.2 3

456 80 110 2 4.1 1

789 70 130 5 3.5 2

⋮ ⋮ ⋮ ⋮ ⋮ ⋮

Measuring fitness of athletes

https://www.mercurynews.com/2017/10/29/4851089/

https://www.mercurynews.com/2017/10/29/4851089/

NYUWIRELESS

Why Use a Linear Model?qMany natural phenomena have linear relationship

qPredictor has small variation◦ Suppose 𝑦 = 𝑓 𝑥◦ If variation of 𝑥 is small around some value 𝑥!, then

𝑦 ≈ 𝑓 𝑥! + 𝑓, 𝑥! 𝑥 − 𝑥! = 𝛽! + 𝛽"𝑥,

𝛽! = 𝑓 𝑥! − 𝑓, 𝑥! 𝑥! , 𝛽" = 𝑓′(𝑥!)

qSimple to compute

qEasy to interpret relation◦ Coefficient 𝛽* indicates the importance of feature j for the target.

qAdvanced: Gaussian random variables: ◦ If two variables are jointly Gaussian, the optimal predictor of one from the other is linear predictor

24

NYUWIRELESS

Matrix ReviewqConsider

𝐴 =1 23 45 6

, 𝐵 = 2 03 2 , 𝑥 = 2

3 ,

qCompute (computations on the board):◦ Matrix vector multiply: 𝐴𝑥◦ Transpose: 𝐴+

◦ Matrix multiply: 𝐴𝐵◦ Solution to linear equations: Solve for 𝑢: 𝑥 = 𝐵𝑢◦ Matrix inverse: 𝐵-"

25

NYUWIRELESS

Slopes, Intercept and Inner ProductsqModel with coefficients 𝜷: M𝑦 = 𝛽# + 𝛽!𝑥! +⋯+ 𝛽*𝑥*qSometimes use weight bias version:

M𝑦 = 𝑏 + 𝑤!𝑥! +⋯+𝑤*𝑥*

◦ 𝑏 = 𝛽! : Bias or intercept◦ 𝒘 = 𝜷":# = [𝛽" , … , 𝛽#]: Weights or slope vector

qCan write either with inner product:

M𝑦 = 𝛽# + 𝜷!:* ⋅ 𝒙 or M𝑦 = 𝑏 + 𝒘 ⋅ 𝒙qInner product:

◦ 𝒘 ⋅ 𝒙 = ∑*/"# 𝑤*𝑥*

◦ Will use alternate notation: 𝐰+𝒙 = 𝒘, 𝒙

26

NYUWIRELESS

Matrix Form of Linear RegressionqData: 𝒙., 𝑦. , 𝑖 = 1,… , 𝑛

qPredicted value for 𝑖-th sample: M𝑦. = 𝛽# + 𝛽!𝑥.! +⋯+ 𝛽*𝑥.*qMatrix form

M𝑦!M𝑦'⋮M𝑦"

=11

𝑥!!𝑥'!

⋯⋯

𝑥!*𝑥'*

⋮ ⋮ ⋯ ⋮1 𝑥"! ⋯ 𝑥"*

𝛽#𝛽!⋮𝛽*

qMatrix equation: �̀� = 𝑨 𝜷

27

𝜷 with 𝑝 = 𝑘 + 1 coefficient vector

𝑨 a 𝑛×(𝑘 + 1) feature matrix

K𝒚 a 𝑛 predicted values

NYUWIRELESS

In-Class Exercise

28

NYUWIRELESS






qExtensions

29

NYUWIRELESS

Least Squares Model FittingqHow do we select parameters 𝜷 = (𝛽#, … , 𝛽*)?

qDefine M𝑦. = 𝛽# + 𝛽!𝑥.! +⋯+ 𝛽*𝑥.*◦ Predicted value on sample 𝑖 for parameters 𝜷 = (𝛽! , … , 𝛽#)

qDefine average residual sum of squares:

RSS 𝜷 :=e/0!

"

𝑦. − M𝑦. '

◦ Note that "𝑦( is implicitly a function of 𝜷 = (𝛽! , … , 𝛽#)◦ Also called the sum of squared residuals (SSR) and sum of squared errors (SSE)

qLeast squares solution: Find 𝜷 to minimize RSS.

30

NYUWIRELESS

Variants of RSSqOften use some variant of RSS

◦ Note: these are not standard

qResidual sum of squares: RSS = ∑.0!" 𝑦. − M𝑦. '

qRSS per sample or Mean Squared Error:

MSE =RSS𝑛

=1𝑛e.0!

"

𝑦. − M𝑦. '

qNormalized RSS or Normalized MSE: ⁄𝑅𝑆𝑆 𝑛

𝑠)'=𝑀𝑆𝐸𝑠)'

=∑.0!" 𝑦. − M𝑦. '

∑.0!" 𝑦. − -𝑦 '

31

NYUWIRELESS

Finding Parameters via OptimizationA general ML recipe

qPick a model with parameters

qGet data

qPick a loss function◦ Measures goodness of fit model to data◦ Function of the parameters

qFind parameters that minimizes loss

32

Linear model: M𝑦 = 𝛽# + 𝛽!𝑥! +⋯+ 𝛽*𝑥*Data: 𝒙., 𝑦. , 𝑖 = 1,2, … , 𝑛

Loss function: 𝑅𝑆𝑆 𝛽#, … , 𝛽* ≔ ∑ 𝑦. − M𝑦. '

Select 𝜷 = (𝛽#, … , 𝛽*) to minimize 𝑅𝑆𝑆 𝜷

General ML problem Multiple linear regression

NYUWIRELESS

RSS as a Vector NormqRSS is given by sum:

RSS =e.0!

"𝑦. − M𝑦. '

qDefine norm of a vector: ◦ 𝒙 = 𝑥"$ + ⋯+ 𝑥0$ "/$

◦ Standard Euclidean norm.◦ Sometimes called ℓ-2 norm. ℓ is for Lebesque

qWrite RSS in vector form:RSS = 𝒚 − �̀� '

33

NYUWIRELESS

Least Squares SolutionqConsider cost function of the RSS:

RSS 𝜷 = Q(/"

2𝑦( − "𝑦( $ , "𝑦( =Q

*/!

3𝐴(*𝛽*

◦ Vector 𝜷 that minimizes RSS called the least-squares solution

qLeast squares solution: The vector 𝜷 that minimizes the RSS is:

n𝜷 = 𝑨%𝑨 &!𝑨%𝒚

◦ Can compute the best coefficient vector analytically◦ Just solve a linear set of equations ◦ Will show the proof below

34

NYUWIRELESS

Proving the LS FormulaqLeast squares formula: The vector 𝜷 that minimizes the RSS is:

n𝜷 = 𝑨%𝑨 &!𝑨%𝒚

qTo prove this formula, we will:

◦ Review gradients of multi-variable functions

◦ Compute gradient 𝛻𝑅𝑆𝑆 𝜷

◦ Solve 𝛻𝑅𝑆𝑆 𝜷 = 0

35

NYUWIRELESS

Gradients of Multi-Variable FunctionsqConsider scalar valued function of a vector: 𝑓 𝜷 = 𝑓(𝛽!, … , 𝛽")

qGradient is the column vector:

𝛻𝑓 𝜷 =⁄𝜕𝑓(𝜷) 𝜕𝛽!⋮⁄𝜕𝑓(𝜷) 𝜕𝛽"

qRepresents direction of maximum increase

qAt a local minima or maxima: 𝛻𝑓 𝜷 = 0◦ Solve 𝑛 equations and 𝑛 unknowns

qEx: 𝑓 𝛽!, 𝛽' = 𝛽! sin 𝛽' + 𝛽!' 𝛽'. ◦ Compute 𝛻𝑓 𝜷 . Solution on board

36

NYUWIRELESS

Proof of the LS FormulaqConsider cost function of the RSS:

RSS = Q(/"

2𝑦( − "𝑦( $ , "𝑦( =Q

*/!

3𝐴(*𝛽*

◦ Vector 𝜷 that minimizes RSS called the least-squares solution

qCompute partial derivatives via chain rule: 1233144

= −2∑.0!" 𝑦. − M𝑦. 𝐴.5, 𝑗 = 1,2, … , 𝑘

qMatrix form: RSS = 𝐴𝜷 − 𝒚 ', 𝛻𝑅𝑆𝑆 = −2𝐴%(𝒚 − 𝑨𝜷)

qSolution: 𝐴% 𝒚 − 𝐴𝜷 = 0 → 𝜷 = 𝐴%𝐴 &!𝐴%𝒚 (least squares solution of equation 𝐴𝜷 = 𝐲)

qMinimum RSS: 𝑅𝑆𝑆 = 𝒚% 𝐼 − 𝐴 𝐴%𝐴 &!𝐴% 𝒚◦ Proof on the board

37

NYUWIRELESS

LS Solution via Auto-Correlation FunctionsqEach data sample has a linear feature vector:

𝐴. = 𝐴.#, ⋯ , 𝐴.* = 1, 𝑥.!, ⋯ , 𝑥.*

qDefine sample auto-correlation matrix and cross-correlation vector:◦ 𝑅55 =

"2𝐴+𝐴, 𝑅55 ℓ,𝑚 = "

2∑(/"2 𝐴(ℓ𝐴(7 (correlation of feature ℓ and feature m)

◦ 𝑅58 ="2𝐴+𝑦, 𝑅85 ℓ = "

2∑(/"2 𝐴(ℓ𝑦( (correlation of feature ℓ and target)

qLeast squares solution is: 𝜷 = 𝑅66&!𝑅6)

38

NYUWIRELESS

Mean Removed Form of the LS SolutionqOften useful to remove mean from data before fitting

qSample mean: -𝑦 = !$∑.0!$ 𝑦., �̅�5 =

!$∑.0!$ 𝑥.5, v𝒙 = �̅�!, ⋯ , �̅�*

qDefined mean removed data: w𝑋.5 = 𝑥.5 − �̅�5, y𝑦. = 𝑦. − -𝑦

qSample covariance matrix and cross-covariance vector:◦ 𝑆99 ℓ,𝑚 = "

:∑(/": (𝑥(ℓ−�̅�ℓ)(𝑥(7−�̅�7), 𝑆99 =

":V𝑿𝑻V𝑿

◦ 𝑆98 ℓ = ":∑(/": (𝑥(ℓ−�̅�ℓ)(𝑦(−X𝑦), 𝑆98 =

":V𝑿+Y𝒚

qMean-Removed form of the least squares solution:

M𝑦 = 𝜷!:* ⋅ 𝒙 + 𝛽#, 𝜷!:* = 𝑆((&!𝑆(), 𝛽# = -𝑦 − 𝜷!:* ⋅ v𝒙

qProof: On board

39

NYUWIRELESS

𝑅!: Goodness of FitqMultiple variable coefficient of determination:

𝑅" =𝑠#" −𝑀𝑆𝐸

𝑠#"= 1 −

𝑀𝑆𝐸𝑠#"

◦ MSE = <==2= "

2∑(/"2 𝑦( − "𝑦( $

◦ Sample variance is: 𝑠8$ ="2∑(/"2 𝑦( − X𝑦 $, X𝑦 = "

2∑(/"2 𝑦( ,

qInterpretation:

◦ >=?@!"

= Error with linear predictorError predicting bymean

◦ 𝑅$ = fraction of variance reduced or “explained” by the model.

qOn the training data (not necessarily on the test data):◦ 𝑅$ ∈ [0,1] always◦ 𝑅$ ≈ 1 ⇒ linear model provides a good fit ◦ 𝑅$ ≈ 0 ⇒ linear model provides a poor fit

40

NYUWIRELESS

In-Class Exercise

41

NYUWIRELESS






qExtensions

42

NYUWIRELESS

Arrays and Vector in Python and MATLAB

43

qThere are some key differences between MATLAB and Python that you need to get used to

qMATLAB◦ All arrays are at least 2 dimensions◦ Vectors are 1×𝑁 (row vectors) or 𝑁×1 (column) vectors◦ Matrix vector multiplication syntax depends if vector is on left or right: x’*A or A*x

qPython:◦ Arrays can have 1, 2, 3, … dimension◦ Vectors can be 1D arrays; matrices are generally 2D arrays◦ Vectors that are 1D arrays are neither row not column vectors◦ If x is 1D and A is 2D, then left and right multiplication are the same: x.dot(A) and A.dot(x)

qLecture notes: We will generally treat 𝑥 and 𝑥% the same. ◦ Can write 𝑥 = (𝑥" , … , 𝑥:) and still multiply by a matrix on left or right

NYUWIRELESS

Fitting Using sklearnqReturn to diabetes data example

qAll code in demo

qDivide data into two portions:◦ Training data: First 300 samples◦ Test data: Remaining 142 samples

qTrain model on training data.

qTest model (i.e. measure RSS) on test data

qReason for splitting data discussed next lecture.

44

NYUWIRELESS

Manually Computing the SolutionqUse numpy linear algebra routine to solve

𝛽 = 𝐴%𝐴 &!𝐴%𝑦

qCommon mistake:◦ Compute matrix inverse 𝑃 = 𝐴+𝐴 -", ◦ Then compute 𝛽 = 𝑃𝐴+𝑦◦ Full matrix inverse is VERY slow. Not needed.◦ Can directly solve linear system: 𝐴 𝛽 = 𝑦◦ Numpy has routines to solve this directly

45

NYUWIRELESS

Calling the sklearn Linear Regression methodqConstruct a linear regression object

qRun it on the training data

qPredict values on the test data

46

NYUWIRELESS

In-Class Exercise

47

NYUWIRELESS




qComputing in python

qExtensions

48

NYUWIRELESS

Transformed Linear ModelsqStandard linear model: M𝑦 = 𝛽# + 𝛽!𝑥! +⋯+ 𝛽7𝑥7qLinear model may be too restrictive

◦ Relation between 𝑥 and 𝑦 can be nonlinear

qUseful to look at models in transformed form:

M𝑦 = 𝛽!𝜙! 𝒙 +⋯+ 𝛽8𝜙8 𝒙◦ Each function 𝜙* 𝒙 = 𝜙*(𝑥" , … , 𝑥A) is called a basis function◦ Each basis function may be nonlinear and a function of multiple variables

qCan write in vector form: M𝑦 = 𝝓 𝒙 ⋅ 𝜷◦ 𝝓 𝒙 = 𝜙" 𝒙 , … , 𝜙3 𝒙 , 𝜷 = [𝛽" , … , 𝛽3]

49

𝑥

𝑦

NYUWIRELESS

Fitting Transformed Linear ModelsqConsider transformed linear model

M𝑦 = 𝛽!𝜙! 𝒙 +⋯+ 𝛽8𝜙8 𝒙

qWe can fit this model exactly as before◦ Given data 𝒙( , 𝑦( , 𝑖 = 1, … , 𝑁◦ Want to fit the model from the transformed variables 𝜙* 𝒙 to target 𝑦◦ Define the transformed matrix:

𝐴 =𝜙"(𝒙") ⋯ 𝜙3(𝒙")

⋮ ⋮ ⋮𝜙"(𝒙:) ⋯ 𝜙3(𝒙:)

◦ Predictions: "𝑦 = 𝐴𝛽◦ Least squares fit t𝛽 = 𝐴+𝐴 -"𝐴+𝑦

50

NYUWIRELESS

Example: Polynomial FittingqSuppose 𝑦 only depends on a single variable 𝑥,

qWant to fit a polynomial model◦ 𝑦 ≈ 𝛽! + 𝛽"𝑥 +⋯𝛽#𝑥#

qGiven data 𝑥( , 𝑦( , 𝑖 = 1,… , 𝑛

qTake basis functions 𝜙* 𝑥 = 𝑥* , 𝑗 = 0,… , 𝑑qTransformed model: "𝑦 = 𝛽!𝜙! 𝑥 +⋯+𝛽A𝜙A 𝑥qTransformed matrix is:

𝑨 =1 𝑥" ⋯ 𝑥"A⋮ ⋮ ⋯ ⋮1 𝑥2 ⋯ 𝑥2A

, 𝛽 =𝛽!⋮𝛽A

◦ 𝑝 = 𝑑 + 1 transformed features from 1 original feature

qWill discuss how to select 𝑑 in the next lecture

51

NYUWIRELESS

Other Nonlinear ExamplesqMultinomial model: M𝑦 = 𝑎 + 𝑏!𝑥! + 𝑏'𝑥' + 𝑐!𝑥!' + 𝑐'𝑥!𝑥' + 𝑐C𝑥''

◦ Contains all second order terms◦ Define parameter vector 𝛽 = 𝑎, 𝑏" , 𝑏$ , 𝑐" , 𝑐$ , 𝑐%◦ Transformed vector 𝜙 𝑥" , 𝑥$ = [1, 𝑥" , 𝑥$ , 𝑥"$ , 𝑥"𝑥$ , 𝑥$$]◦ Note that the features are nonlinear functions of 𝒙 = [𝑥" , 𝑥$]

qExponential model: M𝑦 = 𝑎!e&DB( +⋯+ 𝑎7𝑒&DC(◦ If the parameters 𝑏" , … , 𝑏A are fixed, then the model is linear in the parameters 𝑎" , … , 𝑎A◦ Parameter vector 𝛽 = 𝑎" , … , 𝑎A◦ Transformed vector 𝜙 𝑥 = [𝑒-D#9 , … , 𝑒-D$9]◦ But, if the parameters 𝑏" , … , 𝑏A are not fixed, the model is nonlinear in 𝑏" , … , 𝑏A

52

NYUWIRELESS

Linear Models via Re-ParametrizationqSometimes models can be made into a linear model via re-parametrization

qExample: Consider the model: M𝑦 = 𝐴𝑥!(1 + 𝐵𝑒&(E)◦ Parameters (𝐴, 𝐵)

qThis is nonlinear in (𝐴, 𝐵) due to the product 𝐴𝐵: M𝑦 = 𝐴𝑥! + 𝐴𝐵𝑥!𝑒&(E

qBut, we can define a new set of parameters:◦ 𝛽" = 𝐴 and 𝛽$ = 𝐴𝐵

qThen, M𝑦 = 𝛽!𝑥! + 𝛽'𝑥!𝑒&(E

qBasis functions: 𝜙 𝑥!, 𝑥' = [𝑥!, 𝑥!𝑒&(E]

qAfter we solve for 𝛽!, 𝛽' we can recover 𝐴, 𝐵 via inverting the equations:

𝐴 = 𝛽!, 𝐵 =𝛽'𝐴

53

NYUWIRELESS

Example: Learning Linear SystemsqLinear system: 𝑦$ = 𝑎%𝑦$&% +⋯+ 𝑎'𝑦$&' + 𝑏(𝑥$ +⋯+ 𝑏)𝑥$&) +𝑤$

qTransfer function: 𝐻 𝑧 = *$+⋯+*%-&%

%&.'-&'&⋯&.(-&(

qGiven input sequence and output sequence for T samples,

How do we determine 𝛽 = 𝑎%, ⋯ , 𝑎', 𝑏(, ⋯ , 𝑏) /?

qCan be solved using linear regression!

qWrite 𝑦 = 𝐴𝛽 + 𝑤 and define 𝐴, y◦ See homework problem

qMany applications◦ Learning dynamics in robots / mechanical systems◦ Modeling responses in neural systems◦ Stock market time series◦ Speech modeling. Fit a model each 25 ms.

54

NYUWIRELESS

One Hot EncodingqSuppose that one feature 𝑥5 is a categorical variable

qEx: Predict the price of a car, 𝑦, given model 𝑥! and interior space 𝑥'◦ Suppose there are 3 different models of a car (Ford, BMW, GM)◦ Bad idea: Arbitrarily assign an index to each possible car model◦ Can give unreasonable relations

qOne-hot encoding example: ◦ With 3 possible categories, represent 𝑥" using 3 binary features (𝜙", 𝜙$ , 𝜙%)◦ Model: 𝑦 = 𝛽! + 𝛽"𝜙" + 𝛽$𝜙$ + 𝛽%𝜙% + 𝛽&𝑥$◦ Essentially obtain 3 different models:

◦ Ford: 𝑦 = 𝛽% + 𝛽& + 𝛽'𝑥(◦ BMW: 𝑦 = 𝛽% + 𝛽( + 𝛽'𝑥(◦ GM: 𝑦 = 𝛽% + 𝛽) + 𝛽'𝑥(

◦ Allows different intercepts (or mean values) for different categories!

55

Model 𝝓𝟏 𝝓𝟐 𝝓𝟑

Ford 1 0 0

BMW 0 1 0

GM 0 0 1

NYUWIRELESS

Lab: Robot CalibrationqPredict the current draw

◦ Needed to predict power consumption

qPredictors:◦ Joint angles, velocity and acceleration◦ Strain gauge readings (measure of load)

qFull website at TU Dortmund, Germany◦ http://www.rst.e-technik.tu-

dortmund.de/cms/en/research/robotics/TUDOR_engl/index.html

◦ http://www.rst.e-technik.tu-dortmund.de/forschung/robot-toolbox/MERIt/MERIt_Documentation.pdf

56

http://www.rst.e-technik.tu-dortmund.de/cms/en/research/robotics/TUDOR_engl/index.html

Unit 3 Multiple Linear Regression

Documents

Transcript of Unit 3 Multiple Linear Regression