CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217...
Transcript of CSE217 INTRODUCTION TO DATA SCIENCE LECTURE 4: …m.neumann/sp2019/cse217/... · CSE217...
CSE217 INTRODUCTION TO DATA SCIENCE
Spring 2019Marion Neumann
LECTURE 4: REGRESSION
RECAP: DATA SCIENCE
2
…solving problems with data…
collect & understand
data
clean & format
data
dataproblem
use datato createsolution
scientific or business problem
…which step is most exciting?
Machine Learning
RECAP: ML
• data: anything you can measure or record
• model: specifica9on of a (mathema9cal) rela+onship between different variables
• evalua*on: how well does the model work?
3
…creating and using models that learn from data…
RECAP: ML WORKFLOW• Training phase, test phase, and evaluation phase
à turn to your neighbor• by taking turns, explain what happens in the
• training phase• test phase• evaluation phase
• carefully define what kinds of data are used in each phase
4
data
outputprogram
data
output
ground truth performance
measure
PROPERTY SALES DATAGoal: predict how much my house is worth
• features (input variables)size (in sq. ft): o numeric o categorical o binaryneighborhood: o numeric o categorical o binary# bed rooms: o numeric o categorical o binary# bath rooms: o numeric o categorical o binarypool o numeric o categorical o binaryage (in years): o numeric o categorical o binaryrenovated o numeric o categorical o binary
• house price = target variableo numeric o categorical o binary
5
How can this data
help?
PREDICTING HOUSE PRICES
• target (house price) is a real number
6
How much is my house worth?
Look at Zillow!
LINEAR REGRESSION MODEL
7
TRAINING: MINIMIZE ERROR
8
PDSHp391
Linear Regression
math & statistics
PREDICTION: USE MODEL
9
PDSHp391
Linear Regression
HOW ABOUT MORE COMPLEX MODELS?
10
PDSHp393
Linear Regression
Error on training set:linear model >> quadratic >> 6-order polynomial
ß error is zero!
Is the model with zero (training)
error the best?
EVALUATION FOR REGRESSION
• Training Error vs. Test Error
• Error measures: • RMSE: root mean squared error• MAE: mean absolute error
11
RMSE %&, &() = +,-
.(%0. − 0.)3
MAE %&, &() = +,-
.| %0. − 0.|
%& = 6(7())predictions for test data
MACHINE LEARNING WORKFLOW
• Training Phase, Test Phase, Evaluation Phase
12
SUMMARY & READING• Learning from Data requires a lot of math!
• Regression models are used to predict real valued targets.
• We need a test set to evaluate how well our model generalizes.
13
• DSFS• Ch11: ML (p142-144) • Ch14: Simple Linear Regression (p173-176)
• PDSH Ch5: ML – Linear Regression (p390-394)• LINEAR REGRESSION BY HAND
https://www.wired.com/2011/01/linear-regression-by-hand/
SciKitLearn
understandthe model use the
model in practice