2.000 th 1.800 1.600 1.400 GEEN 1300 1.200 1.000 …cribme.com/cu/data/Computer Science/Introduction...
Transcript of 2.000 th 1.800 1.600 1.400 GEEN 1300 1.200 1.000 …cribme.com/cu/data/Computer Science/Introduction...
1
Class meeting #7Wednesday, Sept 16th
GEEN 1300Introduction toEngineering Computing0.600
0.800
1.000
1.200
1.400
1.600
1.800
2.000
Viscosity (cP)
Viscosity of Water versus Temperature
Spreadsheet Problem Solving general linear regression Polynomial models M ltilinear models
Engineering Computing
0.000
0.200
0.400
20 40 60 80 100 120 140 160 180 200
Temperature (degF)
Note:Section Teston ExcelMonday 9/29
1
Multilinear models Data Analysis Regression Trendline
nonlinear regression using Solver
Homework #4 is posted, due next Wednesday
Monday, 9/297-9 p.m.MATH 100
Example fromlast class
1440
1460
1480
1500
1520
Carbon)
CO2 Emmissions for the US, 1989 ‐ 2000
1320
1340
1360
1380
1400
1420
1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
CO2Emissions (M
MT C
Year
2
Use Data Analysis toolpak
2
recall that, if Data Analysisdoes not appear on the Dataribbon, you will need to checkAnalysis Toolpak in the Add-insdialog box [if it’s not there, youwill have to go back to MicrosoftOffice/Excel set-up]Office/Excel set-up]
Initial, emptyRegression
3
Regressiondialog box
Regression dialog box set up for our problem
4
checking Residualswill give us alsomodel predictions
3
Initial (poorly formatted) Regression output display[ on new worksheet ]
Adjust somecolumn widthsand fix updisplay forappropriatesignificantfi
5
figures
Final Display of Regression Output
[ tons of info, most ofwhich you will notunderstand for acouple years ]
used to judgegoodness offit
intercept andslope values
used to judgewhether terms
6
whether terms“belong” in themodel
add to data graphfor visual comparisonwith model
4
Judging Goodnessof Fit
correlation coefficient: if closeto +1 or –1, indicates strongcorrelation between x and y[something we already knowfrom the original graph!]
coefficient of determination:%-age of the variability in ythat’s accounted for by themodel
adjustment to R2 thatpenalizes the value forusing a model with toomany terms
gives an idea of howfar off the modelpredictions will be
7
a y te s
Adjusted R2 or Standard Error can be used to comparedifferent models and choose which fits best. The higherthe value of Adjusted R2 the better, the lower the valueof Standard Error the better.
Judging whether terms belong in the model
P-values estimate the probabilitythat the true value of the coefficientcould be zero
P-values that are quite small, likethese, indicate that there is littlequestion about the significance ofthe term coefficients. In our casehere, that means that both theintercept term and the slope termb l i h d l
A P-value of 5%(0.05) or greatercauses suspicionthat the coefficient
8
belong in the model.may not besignificant and thatthe term shouldprobably be droppedfrom the model
5
The Data Analysis Regression tool appears much morecomplicated and involved that the shortcut Trendline tool, so . . .
Why use Data Analysis Regression?
1) It provides more information that let’s usjudge the goodness of fit and significanceof model terms
2) It can handle model forms that cannot behandled by Trendline
9
So, generally, when using Excel, we preferthe Data Analysis Regression tool over Trendline
but Trendline is still quite good for “quick and dirty”looks at the data
Learn to use both!
More complicated models
Polynomial models2 3y a bx cx dx
General linear models
Note: it is called linear regression,even when there are nonlinearterms in x, because the terms arelinear in the model parameters,a, b, c, etc.
1 2 3 4y a f x b f x c f x d f x
Examples: polynomial models above
1y a b c ln x
x
Multilinear models
10
1 1 2 2 1 2 3 1 2y a f x ,x , b f x ,x , c f x ,x ,
Examples: 1 2 1 2y a bx cx dx x 1
2
x
xy a e
6
Nonlinear models
Transformable to linear
b xy a e ln y ln a b x t i ht li
Not transformable to linear
BA
T CP 10
straight-lineregression!
We can use the Data Analysis Regression tool for everything
10
Blog P A
T C
11
We can use the Data Analysis Regression tool for everythingexcept the nonlinear models that can’t be transformed intolinear. For those, we can use the Solver.
Example: polynomial regression
curvatureevident
Viscosity of Water versus Temperature
0.800
1.000
1.200
1.400
1.600
1.800
2.000
Viscosity (cP
)
Viscosity of Water versus Temperature
12
0.000
0.200
0.400
0.600
20 40 60 80 100 120 140 160 180 200
Temperature (degF)
7
Setting up for polynomial fits
13Select these for a quadratic model, etc
Data Analysis Regression tool
14
check Labels becauseheadings are includedin selections for Y and X
checkResiduals
8
Quadratic model regression results
model performanceadjR2
copy to graph
model coefficients
15
Quadratic model really doesn’t “capture” behavior of data
1.600
1.800
2.000
Viscosity of Water versus Temperature
0.600
0.800
1.000
1.200
1.400
Viscosity (cP)
16
0.000
0.200
0.400
20 40 60 80 100 120 140 160 180 200
Temperature (degF)
9
Plot of residuals vs temperature looks systematic,showing that model is inadequate
1.500E‐01
0.000E+00
5.000E‐02
1.000E‐01
0 50 100 150 200 250
17‐1.000E‐01
‐5.000E‐02
Continue with fits of cubic, 4th- & 5th-order polynomials
Summary of results
Model Order AdjR2 Standard
ErrorIntercept x x2 x3 x4 x5
2 98.05% 0.0663 9.98E-11 1.05E-07 4.07E-063 99 80% 0 0210 2 56E 12 4 62E 09 3 50E 07 5 49E 06
P-values for the model coefficients
Looks like 5th-order offers best performancebut improvement is marginal over 4th-order,so choose 4th-order.
3 99.80% 0.0210 2.56E-12 4.62E-09 3.50E-07 5.49E-064 99.98% 0.0075 2.01E-12 4.19E-09 3.71E-07 6.47E-06 4.47E-055 99.99% 0.0039 6.77E-11 1.02E-07 7.72E-06 1.17E-04 6.97E-04 2.34E-03
18
Resulting model:4 2
6 3 9 4
Visc 3.161 0.05699 T 5.023 10 T
2.162 10 T 3.593 10 T
10
1.600
1.800
2.000
Viscosity of Water versus Temperature
0.600
0.800
1.000
1.200
1.400
Viscosity (cP)
19
0.000
0.200
0.400
20 40 60 80 100 120 140 160 180 200
Temperature (degF)
1.000E‐02
1.500E‐02
Residuals still somewhat patterned with temperature,but much, much smaller
0.000E+00
5.000E‐03
0 50 100 150 200 250
Series1
20‐1.500E‐02
‐1.000E‐02
‐5.000E‐03
11
Using Trendline, instead of Data Analysis Regression
Set for polynomialOrder: 4
21
Display equationon chart
1.600
1.800
2.000
Viscosity of Water versus Temperature
0.600
0.800
1.000
1.200
1.400
Viscosity (cP)
22
y = 3.593E‐09x4 ‐ 2.162E‐06x3 + 5.023E‐04x2 ‐ 5.699E‐02x + 3.161E+00
0.000
0.200
0.400
20 40 60 80 100 120 140 160 180 200
Temperature (degF)
12
Precautions on polynomial fitting
Try to use the lowest-order model that gives a good fit.
Higher-order models will have “wiggles” between datapoints that will cause prediction errorspoints that will cause prediction errors.
In fact, an (n-1)th-order polynomial will provide a perfectfit to the n data points, but it will usually do bizarre thingsin between the data points.
23
Example: multi-linear regressionModel 1: 1 2y a b x c x
X-input range includes
Model 2: 1 2y b x c x
24
two independent variables:x1 and x2
High P value for intercept inModel 1 suggests Model 2without intercept, but thereis a significant loss in adjR2
13
Multilinear Model Performance
8 0
10.0
12.0
Model performance isn’t thatgreat for either model, andModel 1 doesn’t appeardramatically better than Model 2
2 0
4.0
6.0
8.0
Pre
dic
ted
y
Model 1
Model 2
25
0.0
2.0
0 2 4 6 8 10 12
Measured y
Note: for multi-linear models, we plot Predicted vs Measured y.A perfect model would place points directly on the 45-degree line.
Nonlinear Regression
Fitting the parameters of the van der Waals’ equation of stateData for SO2
2
RT aP ˆ ˆV b V
Find the values of a and bthat give the best predictionsfor P, when compared to themeasured values of P
26
14
Strategy for Nonlinear Regression
1) estimate initial values for a and b
2) compute predicted P’s using data for and TV̂
3) compute errors between predicted P’s and measured P’s
4) sum the squares of these errors to compute SSE
5) have the Solver minimize SSEby adjusting the values of a and b
27
Basic data Calculated Pressure
Sum ofsquaresof this
by both ideal gas lawand van der Waals
-column
28
15
Ideal GasCalculation
Sum of SquaresCalculation
van der Waals Calculation
29Error Calculation
Setting up Solver Parameters
SSE as Target CellMinimizeby adjusting a and bwith b> 0 constraintwith b>=0 constraint
Results
30
16
Results
31
Fit of van der Waals Eqn for SO2
and Comparison to Ideal Gas Law
10000000
12000000
Note departure ofideal gas predictionsat higher pressures
4000000
6000000
8000000
Pre
dic
ted
Pre
ssu
re (
Pa)
van der Waals
Ideal Gas
at higher pressures
32
0
2000000
0 2000000 4000000 6000000 8000000 10000000 12000000
Measured Pressure (Pa)