Unit 3 Multiple Linear Regression

56
NYU WIRELESS Unit 3 Multiple Linear Regression EL-GY 6143/CS-GY 6923: INTRODUCTION TO MACHINE LEARNING PROF. PEI LIU 1

Transcript of Unit 3 Multiple Linear Regression

Page 1: Unit 3 Multiple Linear Regression

NYUWIRELESS

Unit 3 Multiple Linear RegressionEL-GY 6143/CS-GY 6923: INTRODUCTION TO MACHINE LEARNING

PROF. PEI LIU

1

Page 2: Unit 3 Multiple Linear Regression

NYUWIRELESS

Learning ObjectivesqFormulate a machine learning model as a multiple linear regression model.

โ—ฆ Identify prediction vector and target for the problem.

qWrite the regression model in matrix form. Write the feature matrix

qCompute the least-squares solution for the regression coefficients on training data.

qDerive the least-squares formula from minimization of the RSS

qManipulate 2D arrays in python (indexing, stacking, computing shapes, โ€ฆ)

qCompute the LS solution using python linear algebra and machine learning packages

2

Page 3: Unit 3 Multiple Linear Regression

NYUWIRELESS

OutlineqMotivating Example: Understanding glucose levels in diabetes patients

qMultiple variable linear models

qLeast squares solutions

qComputing the solutions in python

qSpecial case: Simple linear regression

qExtensions

3

Page 4: Unit 3 Multiple Linear Regression

NYUWIRELESS

Example: Blood Glucose LevelqDiabetes patients must monitor glucose level

qWhat causes blood glucose levels to rise and fall?

qMany factors

qWe know mechanisms qualitatively

qBut, quantitative models are difficult to obtainโ—ฆ Hard to derive from first principlesโ—ฆ Difficult to model physiological process precisely

qCan machine learning help?

4

Page 5: Unit 3 Multiple Linear Regression

NYUWIRELESS

Data from AIM 94 Experiment

qData collected as series of eventsโ—ฆ Eatingโ—ฆ Exerciseโ—ฆ Insulin dosage

qTarget variable glucose level monitored

5

Page 6: Unit 3 Multiple Linear Regression

NYUWIRELESS

Demo on GitHubqAll code is available in github:https://github.com/pliugithub/MachineLearning/blob/master/unit03_mult_lin_reg/demo_glucose.ipynb

6

Page 7: Unit 3 Multiple Linear Regression

NYUWIRELESS

Using Google ColaboratoryqTwo options for running the demos:

qOption 1: โ—ฆ Clone the github repository to your local machine and run it using jupyter notebookโ—ฆ Need to install all the software correctly

qOption 2: Run on the cloud in Google Colaboratory

qFor Option 2:โ—ฆ Go to https://colab.research.google.com/โ—ฆ File->Open Notebookโ—ฆ Select GitHub tabโ—ฆ Enter github URL:

https://github.com/sdrangan/intromlโ—ฆ Select the unit03_mult_lin_reg/demo1_glucose.ipynb

7

Page 8: Unit 3 Multiple Linear Regression

NYUWIRELESS

Demo on Google Colab

8

Page 9: Unit 3 Multiple Linear Regression

NYUWIRELESS

OutlineqMotivating Example: Understanding glucose levels in diabetes patients

qMultiple variable linear models

qLeast squares solutions

qComputing the solutions in python

qSpecial case: Simple linear regression

qExtensions

9

Page 10: Unit 3 Multiple Linear Regression

NYUWIRELESS

Simple vs. Multiple RegressionqSimple linear regression: One predictor (feature)

โ—ฆ Scalar predictor ๐‘ฅโ—ฆ Linear model: "๐‘ฆ = ๐›ฝ! + ๐›ฝ"๐‘ฅโ—ฆ Can only account for one variable

qMultiple linear regression: Multiple predictors (features)โ—ฆ Vector predictor ๐’™ = (๐‘ฅ" , โ€ฆ , ๐‘ฅ#)โ—ฆ Linear model: "๐‘ฆ = ๐›ฝ! + ๐›ฝ"๐‘ฅ" + โ‹ฏ+ ๐›ฝ#๐‘ฅ#โ—ฆ Can account for multiple predictorsโ—ฆ Turns into simple linear regression when ๐‘˜ = 1

10

Page 11: Unit 3 Multiple Linear Regression

NYUWIRELESS

Comparison to Single Variable ModelsqWe could compute models for each variable separately:

๐‘ฆ = ๐‘Ž" + ๐‘"๐‘ฅ"๐‘ฆ = ๐‘Ž$ + ๐‘$๐‘ฅ$โ‹ฎ

qBut, doesnโ€™t provide a way to account for joint effects

qExample: Consider three linear models to predicting longevity:โ—ฆ A: Longevity vs. some factor in diet (e.g. amount of fiber consumed)โ—ฆ B: Longevity vs. exerciseโ—ฆ C: Longevity vs. diet AND exerciseโ—ฆ What does C tell you that A and B do not?

11

Page 12: Unit 3 Multiple Linear Regression

NYUWIRELESS

Special Case: Single VariableqSuppose ๐‘˜ = 1 predictor.

qFeature matrix and coefficient vector:

๐ด =1 ๐‘ฅ!โ‹ฎ โ‹ฎ1 ๐‘ฅ"

, ๐›ฝ = ๐›ฝ#๐›ฝ!

qLS soln: ๐›ฝ = !$๐ด%๐ด

&! !$๐ด%๐‘ฆ = ๐‘ƒ&!๐‘Ÿ

๐‘ƒ = 1 ๏ฟฝฬ…๏ฟฝ๏ฟฝฬ…๏ฟฝ ๐‘ฅ' , ๐‘Ÿ = -๐‘ฆ

๐‘ฅ๐‘ฆqObtain single variable solutions for coefficients (after some algebra):

๐›ฝ! =๐‘ ()๐‘ ('

, ๐›ฝ# = -๐‘ฆ โˆ’ ๐›ฝ!๏ฟฝฬ…๏ฟฝ, ๐‘…' =๐‘ ()'

๐‘ ('๐‘ )'

12

Page 13: Unit 3 Multiple Linear Regression

NYUWIRELESS

Loading the Data

qscikit-learn package:โ—ฆ Many methods for machine learningโ—ฆ Datasetsโ—ฆ Will use throughout this class

qDiabetes dataset is one example

13

Page 14: Unit 3 Multiple Linear Regression

NYUWIRELESS

Simple Linear Regression for Diabetes DataqTry a fit of each variable individually

qCompute ๐‘…*' coefficient for each variable

qUse formula on previous slide

qโ€œBestโ€ individual variable is a poor fitโ—ฆ ๐‘…#$ โ‰ˆ 0.34

14

Best individual variable

Page 15: Unit 3 Multiple Linear Regression

NYUWIRELESS

Scatter PlotqNo one variable explains glucose well

qMultiple linear regression could be much better

15

Page 16: Unit 3 Multiple Linear Regression

NYUWIRELESS

OutlineqMotivating Example: Understanding glucose levels in diabetes patients

qMultiple variable linear models

qLeast squares solutions

qComputing the solutions in python

qSpecial case: Simple linear regression

qExtensions

16

Page 17: Unit 3 Multiple Linear Regression

NYUWIRELESS

Finding a Mathematical Model

17

qGoal: Find a function to predict glucose level from the 10 attributes

qProblem: Several attributes โ—ฆ Need a multi-variable function

Attributes Target๐‘ฆ =Glucose level

๐‘ฆ โ‰ˆ "๐‘ฆ = ๐‘“(๐‘ฅ" , โ€ฆ , ๐‘ฅ"!)

๐‘ฅ": Age๐‘ฅ$: Sex๐‘ฅ%: BMI๐‘ฅ&: BP๐‘ฅ': S1

โ‹ฎ๐‘ฅ"!: S6

Page 18: Unit 3 Multiple Linear Regression

NYUWIRELESS

Matrix Representation of DataqData is a matrix and a target vector

q๐‘› samples: โ—ฆ One sample per row

q๐‘˜ features / attributes /predictors: โ—ฆ One feature per column

qThis example:โ—ฆ ๐‘ฆ( = blood glucose measurement of i-th sampleโ—ฆ ๐‘ฅ(,* : j-th feature of i-th sampleโ—ฆ ๐’™(

+ = [๐‘ฅ(,", ๐‘ฅ(,$,โ€ฆ, ๐‘ฅ(,#]: feature or predictor vectorโ—ฆ i-th sample contains ๐’™( ,๐‘ฆ(

18

๐‘‹ =๐‘ฅ22 โ‹ฏ ๐‘ฅ23โ‹ฎ โ‹ฑ โ‹ฎ๐‘ฅ42 โ‹ฏ ๐‘ฅ43

Attributes

Samples๐‘ฆ =๐‘ฆ2โ‹ฎ๐‘ฆ4

Target vector

Page 19: Unit 3 Multiple Linear Regression

NYUWIRELESS

In class exercise

19

Page 20: Unit 3 Multiple Linear Regression

NYUWIRELESS

OutlineqMotivating Example: Understanding glucose levels in diabetes patients

qMultiple variable linear models

qLeast squares solutions

qComputing the solutions in python

qSpecial case: Simple linear regression

qExtensions

20

Page 21: Unit 3 Multiple Linear Regression

NYUWIRELESS

Multivariable Linear Model for Glucose

21

qGoal: Find a function to predict glucose level from the 10 attributes

qLinear Model: Assume glucose is a linear function of the predictors:

glucose โ‰ˆ prediction = ๐›ฝ# + ๐›ฝ! Age +โ‹ฏ+ ๐›ฝ+ ๐ต๐‘ƒ + ๐›ฝ, S1 +โ‹ฏ+ ๐›ฝ!# S6

qGeneral form:

AttributesTarget๐‘ฆ =Glucose level

๐‘ฆ โ‰ˆ "๐‘ฆ = ๐‘“(๐‘ฅ" , โ€ฆ , ๐‘ฅ"!)

Age, Sex, BMI,BP,S1, โ€ฆ, S6๐’™ = [๐‘ฅ" , โ€ฆ , ๐‘ฅ"!]

10 Features

๐‘ฆ โ‰ˆ "๐‘ฆ = ๐›ฝ! + ๐›ฝ"๐‘ฅ" + โ‹ฏ+ ๐›ฝ&๐‘ฅ& + ๐›ฝ'๐‘ฅ' + โ‹ฏ+ ๐›ฝ"!๐‘ฅ"!Target

Intercept

Page 22: Unit 3 Multiple Linear Regression

NYUWIRELESS

Multiple Variable Linear ModelqVector of features: ๐’™ = [๐‘ฅ!, โ€ฆ , ๐‘ฅ*]

โ—ฆ ๐‘˜ features (also known as predictors, independent variable, attributes, covariates, โ€ฆ)

qSingle target variable ๐‘ฆโ—ฆ What we want to predict

qLinear model: Make a prediction M๐‘ฆ

๐‘ฆ โ‰ˆ M๐‘ฆ = ๐›ฝ# + ๐›ฝ!๐‘ฅ! +โ‹ฏ+ ๐›ฝ*๐‘ฅ*qData for training

โ—ฆ Samples are (๐’™( , ๐‘ฆ(), i=1,2,โ€ฆ,n. โ—ฆ Each sample has a vector of features: ๐’™( = [๐‘ฅ(" , โ€ฆ , ๐‘ฅ(#] and scalar target ๐‘ฆ(

qProblem: Learn the best coefficients ๐œท = [๐›ฝ# , ๐›ฝ!, โ€ฆ , ๐›ฝ*] from the training data

22

Page 23: Unit 3 Multiple Linear Regression

NYUWIRELESS

Example: Heart Rate IncreaseqLinear Model: HR increase โ‰ˆ ๐›ฝ# + ๐›ฝ! mins exercise + ๐›ฝ' exercise intensity

qData:

23

Subjectnumber

HR before HR after Mins ontreadmill

Speed(min/km)

Days exercise/ week

123 60 90 1 5.2 3

456 80 110 2 4.1 1

789 70 130 5 3.5 2

โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ โ‹ฎ

Measuring fitness of athletes

https://www.mercurynews.com/2017/10/29/4851089/

Page 24: Unit 3 Multiple Linear Regression

NYUWIRELESS

Why Use a Linear Model?qMany natural phenomena have linear relationship

qPredictor has small variationโ—ฆ Suppose ๐‘ฆ = ๐‘“ ๐‘ฅโ—ฆ If variation of ๐‘ฅ is small around some value ๐‘ฅ!, then

๐‘ฆ โ‰ˆ ๐‘“ ๐‘ฅ! + ๐‘“, ๐‘ฅ! ๐‘ฅ โˆ’ ๐‘ฅ! = ๐›ฝ! + ๐›ฝ"๐‘ฅ,

๐›ฝ! = ๐‘“ ๐‘ฅ! โˆ’ ๐‘“, ๐‘ฅ! ๐‘ฅ! , ๐›ฝ" = ๐‘“โ€ฒ(๐‘ฅ!)

qSimple to compute

qEasy to interpret relationโ—ฆ Coefficient ๐›ฝ* indicates the importance of feature j for the target.

qAdvanced: Gaussian random variables: โ—ฆ If two variables are jointly Gaussian, the optimal predictor of one from the other is linear predictor

24

Page 25: Unit 3 Multiple Linear Regression

NYUWIRELESS

Matrix ReviewqConsider

๐ด =1 23 45 6

, ๐ต = 2 03 2 , ๐‘ฅ = 2

3 ,

qCompute (computations on the board):โ—ฆ Matrix vector multiply: ๐ด๐‘ฅโ—ฆ Transpose: ๐ด+

โ—ฆ Matrix multiply: ๐ด๐ตโ—ฆ Solution to linear equations: Solve for ๐‘ข: ๐‘ฅ = ๐ต๐‘ขโ—ฆ Matrix inverse: ๐ต-"

25

Page 26: Unit 3 Multiple Linear Regression

NYUWIRELESS

Slopes, Intercept and Inner ProductsqModel with coefficients ๐œท: M๐‘ฆ = ๐›ฝ# + ๐›ฝ!๐‘ฅ! +โ‹ฏ+ ๐›ฝ*๐‘ฅ*qSometimes use weight bias version:

M๐‘ฆ = ๐‘ + ๐‘ค!๐‘ฅ! +โ‹ฏ+๐‘ค*๐‘ฅ*

โ—ฆ ๐‘ = ๐›ฝ! : Bias or interceptโ—ฆ ๐’˜ = ๐œท":# = [๐›ฝ" , โ€ฆ , ๐›ฝ#]: Weights or slope vector

qCan write either with inner product:

M๐‘ฆ = ๐›ฝ# + ๐œท!:* โ‹… ๐’™ or M๐‘ฆ = ๐‘ + ๐’˜ โ‹… ๐’™qInner product:

โ—ฆ ๐’˜ โ‹… ๐’™ = โˆ‘*/"# ๐‘ค*๐‘ฅ*

โ—ฆ Will use alternate notation: ๐ฐ+๐’™ = ๐’˜, ๐’™

26

Page 27: Unit 3 Multiple Linear Regression

NYUWIRELESS

Matrix Form of Linear RegressionqData: ๐’™., ๐‘ฆ. , ๐‘– = 1,โ€ฆ , ๐‘›

qPredicted value for ๐‘–-th sample: M๐‘ฆ. = ๐›ฝ# + ๐›ฝ!๐‘ฅ.! +โ‹ฏ+ ๐›ฝ*๐‘ฅ.*qMatrix form

M๐‘ฆ!M๐‘ฆ'โ‹ฎM๐‘ฆ"

=11

๐‘ฅ!!๐‘ฅ'!

โ‹ฏโ‹ฏ

๐‘ฅ!*๐‘ฅ'*

โ‹ฎ โ‹ฎ โ‹ฏ โ‹ฎ1 ๐‘ฅ"! โ‹ฏ ๐‘ฅ"*

๐›ฝ#๐›ฝ!โ‹ฎ๐›ฝ*

qMatrix equation: ๏ฟฝฬ€๏ฟฝ = ๐‘จ ๐œท

27

๐œท with ๐‘ = ๐‘˜ + 1 coefficient vector

๐‘จ a ๐‘›ร—(๐‘˜ + 1) feature matrix

K๐’š a ๐‘› predicted values

Page 28: Unit 3 Multiple Linear Regression

NYUWIRELESS

In-Class Exercise

28

Page 29: Unit 3 Multiple Linear Regression

NYUWIRELESS

OutlineqMotivating Example: Understanding glucose levels in diabetes patients

qMultiple variable linear models

qLeast squares solutions

qComputing the solutions in python

qSpecial case: Simple linear regression

qExtensions

29

Page 30: Unit 3 Multiple Linear Regression

NYUWIRELESS

Least Squares Model FittingqHow do we select parameters ๐œท = (๐›ฝ#, โ€ฆ , ๐›ฝ*)?

qDefine M๐‘ฆ. = ๐›ฝ# + ๐›ฝ!๐‘ฅ.! +โ‹ฏ+ ๐›ฝ*๐‘ฅ.*โ—ฆ Predicted value on sample ๐‘– for parameters ๐œท = (๐›ฝ! , โ€ฆ , ๐›ฝ#)

qDefine average residual sum of squares:

RSS ๐œท :=e/0!

"

๐‘ฆ. โˆ’ M๐‘ฆ. '

โ—ฆ Note that "๐‘ฆ( is implicitly a function of ๐œท = (๐›ฝ! , โ€ฆ , ๐›ฝ#)โ—ฆ Also called the sum of squared residuals (SSR) and sum of squared errors (SSE)

qLeast squares solution: Find ๐œท to minimize RSS.

30

Page 31: Unit 3 Multiple Linear Regression

NYUWIRELESS

Variants of RSSqOften use some variant of RSS

โ—ฆ Note: these are not standard

qResidual sum of squares: RSS = โˆ‘.0!" ๐‘ฆ. โˆ’ M๐‘ฆ. '

qRSS per sample or Mean Squared Error:

MSE =RSS๐‘›

=1๐‘›e.0!

"

๐‘ฆ. โˆ’ M๐‘ฆ. '

qNormalized RSS or Normalized MSE: โ„๐‘…๐‘†๐‘† ๐‘›

๐‘ )'=๐‘€๐‘†๐ธ๐‘ )'

=โˆ‘.0!" ๐‘ฆ. โˆ’ M๐‘ฆ. '

โˆ‘.0!" ๐‘ฆ. โˆ’ -๐‘ฆ '

31

Page 32: Unit 3 Multiple Linear Regression

NYUWIRELESS

Finding Parameters via OptimizationA general ML recipe

qPick a model with parameters

qGet data

qPick a loss functionโ—ฆ Measures goodness of fit model to dataโ—ฆ Function of the parameters

qFind parameters that minimizes loss

32

Linear model: M๐‘ฆ = ๐›ฝ# + ๐›ฝ!๐‘ฅ! +โ‹ฏ+ ๐›ฝ*๐‘ฅ*Data: ๐’™., ๐‘ฆ. , ๐‘– = 1,2, โ€ฆ , ๐‘›

Loss function: ๐‘…๐‘†๐‘† ๐›ฝ#, โ€ฆ , ๐›ฝ* โ‰” โˆ‘ ๐‘ฆ. โˆ’ M๐‘ฆ. '

Select ๐œท = (๐›ฝ#, โ€ฆ , ๐›ฝ*) to minimize ๐‘…๐‘†๐‘† ๐œท

General ML problem Multiple linear regression

Page 33: Unit 3 Multiple Linear Regression

NYUWIRELESS

RSS as a Vector NormqRSS is given by sum:

RSS =e.0!

"๐‘ฆ. โˆ’ M๐‘ฆ. '

qDefine norm of a vector: โ—ฆ ๐’™ = ๐‘ฅ"$ + โ‹ฏ+ ๐‘ฅ0$ "/$

โ—ฆ Standard Euclidean norm.โ—ฆ Sometimes called โ„“-2 norm. โ„“ is for Lebesque

qWrite RSS in vector form:RSS = ๐’š โˆ’ ๏ฟฝฬ€๏ฟฝ '

33

Page 34: Unit 3 Multiple Linear Regression

NYUWIRELESS

Least Squares SolutionqConsider cost function of the RSS:

RSS ๐œท = Q(/"

2๐‘ฆ( โˆ’ "๐‘ฆ( $ , "๐‘ฆ( =Q

*/!

3๐ด(*๐›ฝ*

โ—ฆ Vector ๐œท that minimizes RSS called the least-squares solution

qLeast squares solution: The vector ๐œท that minimizes the RSS is:

n๐œท = ๐‘จ%๐‘จ &!๐‘จ%๐’š

โ—ฆ Can compute the best coefficient vector analyticallyโ—ฆ Just solve a linear set of equations โ—ฆ Will show the proof below

34

Page 35: Unit 3 Multiple Linear Regression

NYUWIRELESS

Proving the LS FormulaqLeast squares formula: The vector ๐œท that minimizes the RSS is:

n๐œท = ๐‘จ%๐‘จ &!๐‘จ%๐’š

qTo prove this formula, we will:

โ—ฆ Review gradients of multi-variable functions

โ—ฆ Compute gradient ๐›ป๐‘…๐‘†๐‘† ๐œท

โ—ฆ Solve ๐›ป๐‘…๐‘†๐‘† ๐œท = 0

35

Page 36: Unit 3 Multiple Linear Regression

NYUWIRELESS

Gradients of Multi-Variable FunctionsqConsider scalar valued function of a vector: ๐‘“ ๐œท = ๐‘“(๐›ฝ!, โ€ฆ , ๐›ฝ")

qGradient is the column vector:

๐›ป๐‘“ ๐œท =โ„๐œ•๐‘“(๐œท) ๐œ•๐›ฝ!โ‹ฎโ„๐œ•๐‘“(๐œท) ๐œ•๐›ฝ"

qRepresents direction of maximum increase

qAt a local minima or maxima: ๐›ป๐‘“ ๐œท = 0โ—ฆ Solve ๐‘› equations and ๐‘› unknowns

qEx: ๐‘“ ๐›ฝ!, ๐›ฝ' = ๐›ฝ! sin ๐›ฝ' + ๐›ฝ!' ๐›ฝ'. โ—ฆ Compute ๐›ป๐‘“ ๐œท . Solution on board

36

Page 37: Unit 3 Multiple Linear Regression

NYUWIRELESS

Proof of the LS FormulaqConsider cost function of the RSS:

RSS = Q(/"

2๐‘ฆ( โˆ’ "๐‘ฆ( $ , "๐‘ฆ( =Q

*/!

3๐ด(*๐›ฝ*

โ—ฆ Vector ๐œท that minimizes RSS called the least-squares solution

qCompute partial derivatives via chain rule: 1233144

= โˆ’2โˆ‘.0!" ๐‘ฆ. โˆ’ M๐‘ฆ. ๐ด.5, ๐‘— = 1,2, โ€ฆ , ๐‘˜

qMatrix form: RSS = ๐ด๐œท โˆ’ ๐’š ', ๐›ป๐‘…๐‘†๐‘† = โˆ’2๐ด%(๐’š โˆ’ ๐‘จ๐œท)

qSolution: ๐ด% ๐’š โˆ’ ๐ด๐œท = 0 โ†’ ๐œท = ๐ด%๐ด &!๐ด%๐’š (least squares solution of equation ๐ด๐œท = ๐ฒ)

qMinimum RSS: ๐‘…๐‘†๐‘† = ๐’š% ๐ผ โˆ’ ๐ด ๐ด%๐ด &!๐ด% ๐’šโ—ฆ Proof on the board

37

Page 38: Unit 3 Multiple Linear Regression

NYUWIRELESS

LS Solution via Auto-Correlation FunctionsqEach data sample has a linear feature vector:

๐ด. = ๐ด.#, โ‹ฏ , ๐ด.* = 1, ๐‘ฅ.!, โ‹ฏ , ๐‘ฅ.*

qDefine sample auto-correlation matrix and cross-correlation vector:โ—ฆ ๐‘…55 =

"2๐ด+๐ด, ๐‘…55 โ„“,๐‘š = "

2โˆ‘(/"2 ๐ด(โ„“๐ด(7 (correlation of feature โ„“ and feature m)

โ—ฆ ๐‘…58 ="2๐ด+๐‘ฆ, ๐‘…85 โ„“ = "

2โˆ‘(/"2 ๐ด(โ„“๐‘ฆ( (correlation of feature โ„“ and target)

qLeast squares solution is: ๐œท = ๐‘…66&!๐‘…6)

38

Page 39: Unit 3 Multiple Linear Regression

NYUWIRELESS

Mean Removed Form of the LS SolutionqOften useful to remove mean from data before fitting

qSample mean: -๐‘ฆ = !$โˆ‘.0!$ ๐‘ฆ., ๏ฟฝฬ…๏ฟฝ5 =

!$โˆ‘.0!$ ๐‘ฅ.5, v๐’™ = ๏ฟฝฬ…๏ฟฝ!, โ‹ฏ , ๏ฟฝฬ…๏ฟฝ*

qDefined mean removed data: w๐‘‹.5 = ๐‘ฅ.5 โˆ’ ๏ฟฝฬ…๏ฟฝ5, y๐‘ฆ. = ๐‘ฆ. โˆ’ -๐‘ฆ

qSample covariance matrix and cross-covariance vector:โ—ฆ ๐‘†99 โ„“,๐‘š = "

:โˆ‘(/": (๐‘ฅ(โ„“โˆ’๏ฟฝฬ…๏ฟฝโ„“)(๐‘ฅ(7โˆ’๏ฟฝฬ…๏ฟฝ7), ๐‘†99 =

":V๐‘ฟ๐‘ปV๐‘ฟ

โ—ฆ ๐‘†98 โ„“ = ":โˆ‘(/": (๐‘ฅ(โ„“โˆ’๏ฟฝฬ…๏ฟฝโ„“)(๐‘ฆ(โˆ’X๐‘ฆ), ๐‘†98 =

":V๐‘ฟ+Y๐’š

qMean-Removed form of the least squares solution:

M๐‘ฆ = ๐œท!:* โ‹… ๐’™ + ๐›ฝ#, ๐œท!:* = ๐‘†((&!๐‘†(), ๐›ฝ# = -๐‘ฆ โˆ’ ๐œท!:* โ‹… v๐’™

qProof: On board

39

Page 40: Unit 3 Multiple Linear Regression

NYUWIRELESS

๐‘…!: Goodness of FitqMultiple variable coefficient of determination:

๐‘…" =๐‘ #" โˆ’๐‘€๐‘†๐ธ

๐‘ #"= 1 โˆ’

๐‘€๐‘†๐ธ๐‘ #"

โ—ฆ MSE = <==2= "

2โˆ‘(/"2 ๐‘ฆ( โˆ’ "๐‘ฆ( $

โ—ฆ Sample variance is: ๐‘ 8$ ="2โˆ‘(/"2 ๐‘ฆ( โˆ’ X๐‘ฆ $, X๐‘ฆ = "

2โˆ‘(/"2 ๐‘ฆ( ,

qInterpretation:

โ—ฆ >=?@!"

= Error with linear predictorError predicting bymean

โ—ฆ ๐‘…$ = fraction of variance reduced or โ€œexplainedโ€ by the model.

qOn the training data (not necessarily on the test data):โ—ฆ ๐‘…$ โˆˆ [0,1] alwaysโ—ฆ ๐‘…$ โ‰ˆ 1 โ‡’ linear model provides a good fit โ—ฆ ๐‘…$ โ‰ˆ 0 โ‡’ linear model provides a poor fit

40

Page 41: Unit 3 Multiple Linear Regression

NYUWIRELESS

In-Class Exercise

41

Page 42: Unit 3 Multiple Linear Regression

NYUWIRELESS

OutlineqMotivating Example: Understanding glucose levels in diabetes patients

qMultiple variable linear models

qLeast squares solutions

qComputing the solutions in python

qSpecial case: Simple linear regression

qExtensions

42

Page 43: Unit 3 Multiple Linear Regression

NYUWIRELESS

Arrays and Vector in Python and MATLAB

43

qThere are some key differences between MATLAB and Python that you need to get used to

qMATLABโ—ฆ All arrays are at least 2 dimensionsโ—ฆ Vectors are 1ร—๐‘ (row vectors) or ๐‘ร—1 (column) vectorsโ—ฆ Matrix vector multiplication syntax depends if vector is on left or right: xโ€™*A or A*x

qPython:โ—ฆ Arrays can have 1, 2, 3, โ€ฆ dimensionโ—ฆ Vectors can be 1D arrays; matrices are generally 2D arraysโ—ฆ Vectors that are 1D arrays are neither row not column vectorsโ—ฆ If x is 1D and A is 2D, then left and right multiplication are the same: x.dot(A) and A.dot(x)

qLecture notes: We will generally treat ๐‘ฅ and ๐‘ฅ% the same. โ—ฆ Can write ๐‘ฅ = (๐‘ฅ" , โ€ฆ , ๐‘ฅ:) and still multiply by a matrix on left or right

Page 44: Unit 3 Multiple Linear Regression

NYUWIRELESS

Fitting Using sklearnqReturn to diabetes data example

qAll code in demo

qDivide data into two portions:โ—ฆ Training data: First 300 samplesโ—ฆ Test data: Remaining 142 samples

qTrain model on training data.

qTest model (i.e. measure RSS) on test data

qReason for splitting data discussed next lecture.

44

Page 45: Unit 3 Multiple Linear Regression

NYUWIRELESS

Manually Computing the SolutionqUse numpy linear algebra routine to solve

๐›ฝ = ๐ด%๐ด &!๐ด%๐‘ฆ

qCommon mistake:โ—ฆ Compute matrix inverse ๐‘ƒ = ๐ด+๐ด -", โ—ฆ Then compute ๐›ฝ = ๐‘ƒ๐ด+๐‘ฆโ—ฆ Full matrix inverse is VERY slow. Not needed.โ—ฆ Can directly solve linear system: ๐ด ๐›ฝ = ๐‘ฆโ—ฆ Numpy has routines to solve this directly

45

Page 46: Unit 3 Multiple Linear Regression

NYUWIRELESS

Calling the sklearn Linear Regression methodqConstruct a linear regression object

qRun it on the training data

qPredict values on the test data

46

Page 47: Unit 3 Multiple Linear Regression

NYUWIRELESS

In-Class Exercise

47

Page 48: Unit 3 Multiple Linear Regression

NYUWIRELESS

OutlineqMotivating Example: Understanding glucose levels in diabetes patients

qMultiple variable linear models

qLeast squares solutions

qComputing in python

qExtensions

48

Page 49: Unit 3 Multiple Linear Regression

NYUWIRELESS

Transformed Linear ModelsqStandard linear model: M๐‘ฆ = ๐›ฝ# + ๐›ฝ!๐‘ฅ! +โ‹ฏ+ ๐›ฝ7๐‘ฅ7qLinear model may be too restrictive

โ—ฆ Relation between ๐‘ฅ and ๐‘ฆ can be nonlinear

qUseful to look at models in transformed form:

M๐‘ฆ = ๐›ฝ!๐œ™! ๐’™ +โ‹ฏ+ ๐›ฝ8๐œ™8 ๐’™โ—ฆ Each function ๐œ™* ๐’™ = ๐œ™*(๐‘ฅ" , โ€ฆ , ๐‘ฅA) is called a basis functionโ—ฆ Each basis function may be nonlinear and a function of multiple variables

qCan write in vector form: M๐‘ฆ = ๐“ ๐’™ โ‹… ๐œทโ—ฆ ๐“ ๐’™ = ๐œ™" ๐’™ , โ€ฆ , ๐œ™3 ๐’™ , ๐œท = [๐›ฝ" , โ€ฆ , ๐›ฝ3]

49

๐‘ฅ

๐‘ฆ

Page 50: Unit 3 Multiple Linear Regression

NYUWIRELESS

Fitting Transformed Linear ModelsqConsider transformed linear model

M๐‘ฆ = ๐›ฝ!๐œ™! ๐’™ +โ‹ฏ+ ๐›ฝ8๐œ™8 ๐’™

qWe can fit this model exactly as beforeโ—ฆ Given data ๐’™( , ๐‘ฆ( , ๐‘– = 1, โ€ฆ , ๐‘โ—ฆ Want to fit the model from the transformed variables ๐œ™* ๐’™ to target ๐‘ฆโ—ฆ Define the transformed matrix:

๐ด =๐œ™"(๐’™") โ‹ฏ ๐œ™3(๐’™")

โ‹ฎ โ‹ฎ โ‹ฎ๐œ™"(๐’™:) โ‹ฏ ๐œ™3(๐’™:)

โ—ฆ Predictions: "๐‘ฆ = ๐ด๐›ฝโ—ฆ Least squares fit t๐›ฝ = ๐ด+๐ด -"๐ด+๐‘ฆ

50

Page 51: Unit 3 Multiple Linear Regression

NYUWIRELESS

Example: Polynomial FittingqSuppose ๐‘ฆ only depends on a single variable ๐‘ฅ,

qWant to fit a polynomial modelโ—ฆ ๐‘ฆ โ‰ˆ ๐›ฝ! + ๐›ฝ"๐‘ฅ +โ‹ฏ๐›ฝ#๐‘ฅ#

qGiven data ๐‘ฅ( , ๐‘ฆ( , ๐‘– = 1,โ€ฆ , ๐‘›

qTake basis functions ๐œ™* ๐‘ฅ = ๐‘ฅ* , ๐‘— = 0,โ€ฆ , ๐‘‘qTransformed model: "๐‘ฆ = ๐›ฝ!๐œ™! ๐‘ฅ +โ‹ฏ+๐›ฝA๐œ™A ๐‘ฅqTransformed matrix is:

๐‘จ =1 ๐‘ฅ" โ‹ฏ ๐‘ฅ"Aโ‹ฎ โ‹ฎ โ‹ฏ โ‹ฎ1 ๐‘ฅ2 โ‹ฏ ๐‘ฅ2A

, ๐›ฝ =๐›ฝ!โ‹ฎ๐›ฝA

โ—ฆ ๐‘ = ๐‘‘ + 1 transformed features from 1 original feature

qWill discuss how to select ๐‘‘ in the next lecture

51

Page 52: Unit 3 Multiple Linear Regression

NYUWIRELESS

Other Nonlinear ExamplesqMultinomial model: M๐‘ฆ = ๐‘Ž + ๐‘!๐‘ฅ! + ๐‘'๐‘ฅ' + ๐‘!๐‘ฅ!' + ๐‘'๐‘ฅ!๐‘ฅ' + ๐‘C๐‘ฅ''

โ—ฆ Contains all second order termsโ—ฆ Define parameter vector ๐›ฝ = ๐‘Ž, ๐‘" , ๐‘$ , ๐‘" , ๐‘$ , ๐‘%โ—ฆ Transformed vector ๐œ™ ๐‘ฅ" , ๐‘ฅ$ = [1, ๐‘ฅ" , ๐‘ฅ$ , ๐‘ฅ"$ , ๐‘ฅ"๐‘ฅ$ , ๐‘ฅ$$]โ—ฆ Note that the features are nonlinear functions of ๐’™ = [๐‘ฅ" , ๐‘ฅ$]

qExponential model: M๐‘ฆ = ๐‘Ž!e&DB( +โ‹ฏ+ ๐‘Ž7๐‘’&DC(โ—ฆ If the parameters ๐‘" , โ€ฆ , ๐‘A are fixed, then the model is linear in the parameters ๐‘Ž" , โ€ฆ , ๐‘ŽAโ—ฆ Parameter vector ๐›ฝ = ๐‘Ž" , โ€ฆ , ๐‘ŽAโ—ฆ Transformed vector ๐œ™ ๐‘ฅ = [๐‘’-D#9 , โ€ฆ , ๐‘’-D$9]โ—ฆ But, if the parameters ๐‘" , โ€ฆ , ๐‘A are not fixed, the model is nonlinear in ๐‘" , โ€ฆ , ๐‘A

52

Page 53: Unit 3 Multiple Linear Regression

NYUWIRELESS

Linear Models via Re-ParametrizationqSometimes models can be made into a linear model via re-parametrization

qExample: Consider the model: M๐‘ฆ = ๐ด๐‘ฅ!(1 + ๐ต๐‘’&(E)โ—ฆ Parameters (๐ด, ๐ต)

qThis is nonlinear in (๐ด, ๐ต) due to the product ๐ด๐ต: M๐‘ฆ = ๐ด๐‘ฅ! + ๐ด๐ต๐‘ฅ!๐‘’&(E

qBut, we can define a new set of parameters:โ—ฆ ๐›ฝ" = ๐ด and ๐›ฝ$ = ๐ด๐ต

qThen, M๐‘ฆ = ๐›ฝ!๐‘ฅ! + ๐›ฝ'๐‘ฅ!๐‘’&(E

qBasis functions: ๐œ™ ๐‘ฅ!, ๐‘ฅ' = [๐‘ฅ!, ๐‘ฅ!๐‘’&(E]

qAfter we solve for ๐›ฝ!, ๐›ฝ' we can recover ๐ด, ๐ต via inverting the equations:

๐ด = ๐›ฝ!, ๐ต =๐›ฝ'๐ด

53

Page 54: Unit 3 Multiple Linear Regression

NYUWIRELESS

Example: Learning Linear SystemsqLinear system: ๐‘ฆ$ = ๐‘Ž%๐‘ฆ$&% +โ‹ฏ+ ๐‘Ž'๐‘ฆ$&' + ๐‘(๐‘ฅ$ +โ‹ฏ+ ๐‘)๐‘ฅ$&) +๐‘ค$

qTransfer function: ๐ป ๐‘ง = *$+โ‹ฏ+*%-&%

%&.'-&'&โ‹ฏ&.(-&(

qGiven input sequence and output sequence for T samples,

How do we determine ๐›ฝ = ๐‘Ž%, โ‹ฏ , ๐‘Ž', ๐‘(, โ‹ฏ , ๐‘) /?

qCan be solved using linear regression!

qWrite ๐‘ฆ = ๐ด๐›ฝ + ๐‘ค and define ๐ด, yโ—ฆ See homework problem

qMany applicationsโ—ฆ Learning dynamics in robots / mechanical systemsโ—ฆ Modeling responses in neural systemsโ—ฆ Stock market time seriesโ—ฆ Speech modeling. Fit a model each 25 ms.

54

Page 55: Unit 3 Multiple Linear Regression

NYUWIRELESS

One Hot EncodingqSuppose that one feature ๐‘ฅ5 is a categorical variable

qEx: Predict the price of a car, ๐‘ฆ, given model ๐‘ฅ! and interior space ๐‘ฅ'โ—ฆ Suppose there are 3 different models of a car (Ford, BMW, GM)โ—ฆ Bad idea: Arbitrarily assign an index to each possible car modelโ—ฆ Can give unreasonable relations

qOne-hot encoding example: โ—ฆ With 3 possible categories, represent ๐‘ฅ" using 3 binary features (๐œ™", ๐œ™$ , ๐œ™%)โ—ฆ Model: ๐‘ฆ = ๐›ฝ! + ๐›ฝ"๐œ™" + ๐›ฝ$๐œ™$ + ๐›ฝ%๐œ™% + ๐›ฝ&๐‘ฅ$โ—ฆ Essentially obtain 3 different models:

โ—ฆ Ford: ๐‘ฆ = ๐›ฝ% + ๐›ฝ& + ๐›ฝ'๐‘ฅ(โ—ฆ BMW: ๐‘ฆ = ๐›ฝ% + ๐›ฝ( + ๐›ฝ'๐‘ฅ(โ—ฆ GM: ๐‘ฆ = ๐›ฝ% + ๐›ฝ) + ๐›ฝ'๐‘ฅ(

โ—ฆ Allows different intercepts (or mean values) for different categories!

55

Model ๐“๐Ÿ ๐“๐Ÿ ๐“๐Ÿ‘

Ford 1 0 0

BMW 0 1 0

GM 0 0 1

Page 56: Unit 3 Multiple Linear Regression

NYUWIRELESS

Lab: Robot CalibrationqPredict the current draw

โ—ฆ Needed to predict power consumption

qPredictors:โ—ฆ Joint angles, velocity and accelerationโ—ฆ Strain gauge readings (measure of load)

qFull website at TU Dortmund, Germanyโ—ฆ http://www.rst.e-technik.tu-

dortmund.de/cms/en/research/robotics/TUDOR_engl/index.html

โ—ฆ http://www.rst.e-technik.tu-dortmund.de/forschung/robot-toolbox/MERIt/MERIt_Documentation.pdf

56