ACTL2002/ACTL5101 Probability and Statistics: Week 11
ACTL2002/ACTL5101 Probability and Statistics
c© Katja Ignatieva
School of Risk and Actuarial StudiesAustralian School of BusinessUniversity of New South Wales
Week 11Probability: Week 1 Week 2 Week 3 Week 4
Estimation: Week 5 Week 6 Review
Hypothesis testing: Week 7 Week 8 Week 9
Linear regression: Week 10 Week 12
Video lectures: Week 1 VL Week 2 VL Week 3 VL Week 4 VL Week 5 VL
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Last ten weeks
Introduction to probability;
Moments: (non)-central moments, mean, variance (standarddeviation), skewness & kurtosis;
Special univariate (parametric) distributions (discrete &continue);
Joint distributions;
Convergence; with applications LLN & CLT;
Estimators (MME, MLE, and Bayesian);
Evaluation of estimators;
Interval estimation.3201/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Final two weeks
Simple linear regression:- Idea;
- Estimating using LSE (& BLUE estimator & relation MLE);
- Partition of variability of the variable;
- Testing:i) Slope;
ii) Intercept;
iii) Regression line;
iv) Correlation coefficient.
Multiple linear regression:- Matrix notation;
- LSE estimates;
- Tests;
- R-squared and adjusted R-squared.3202/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear Algebra and Matrix Approach
Multiple Linear regression
Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models
Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters
Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression
AppendixSimple linear regression in matrix form
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear Algebra and Matrix Approach
Linear Algebra and Matrix Approach
In general we will consider multiple regression problem:
y = β0 + β1x1 + β2x2 + . . .+ βp−1xp−1
and data points:
y1 x11 x12 . . . x1,p−1y2 x21 x22 . . . x2,p−1...
......
. . ....
yn xn1 xn2 . . . xn,p−1
3203/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear Algebra and Matrix Approach
Multiple Regression: Linear Algebra and Matrix Approach
Observations yi are written in a vector y .
Regression coefficients are the vector (p by 1)β = [β0, β1, . . . , βp−1]> where > indicates transpose (β acolumn vector).
The matrix X (size n by p) is:
X =
1 x11 x12 . . . x1,p−11 x21 x22 . . . x2,p−1...
......
. . ....
1 xn1 xn2 . . . xn,p−1
Predicted values are:
y = Xβ.3204/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear Algebra and Matrix Approach
Multiple Regression: Linear Algebra and Matrix Approach
Least squares problem is to select β to minimize:
S(β)
=(y − Xβ
)> (y − Xβ
).
Proof: see next slides.
Differentiate with respect to each of the β′s and the normalequations become:
X>Xβ = X>y .
If X>X is non-singular then the parameter estimates are:
β =(
X>X)−1
X>y .
The residuals are:
ε = y − y = y − Xβ.3205/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear Algebra and Matrix Approach
The least squares problem is to find the vector β that minimizes:
S(β)
=n∑
i=1
ε2i =n∑
i=1
(yi − yi )2
=n∑
i=1
(yi − β0 − β1xi1 − . . .− βp−1xip−1)2
=(y − Xβ
)> (y − Xβ
).
Derivation of least squares estimator:
0 =∂
∂β
(y − Xβ
)> (y − Xβ
)=∂
∂β
(y>y − 2
(X>y
)>β + β>X>Xβ
)=− 2X>y + X>Xβ +
(X>X
)>β
=− 2X>y + 2X>Xβ
⇒ X>y =X>Xβ ⇒ β =(
X>X)−1
X>y .3206/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear Algebra and Matrix Approach
The Least Squares EstimatesDifferentiating this matrix w.r.t. β and equating equal to zeroleads:
X>Xβ = X>Y ,
i.e., the normal equations. If(X>X
)−1exists, the solution is:
β =(
X>X)−1
X>Y .
The corresponding vector of fitted (or predicted) values of y is:
Y = Xβ
and the vector of residuals:
ε = Y − Y = Y − Xβ
gives the differences between the observed and fitted values.3207/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
The Model in Matrix Form
Multiple Linear regression
Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models
Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters
Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression
AppendixSimple linear regression in matrix form
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
The Model in Matrix Form
The Model in Matrix Form
Consider the regression model of the form:
y = β0 + β1x1 + . . .+ βp−1xp−1 + ε.
Fitted to data, the model becomes:
yi = β0 + β1xi1 + . . .+ βp−1xip−1 + εi , for i = 1, 2, . . . , n.
Define the vectors:
Y[n×1]
=
y1y2...yn
, β[p×1]
=
β0β1...
βp−1
, and ε[n×1]
=
ε1ε2...εn
.3208/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
The Model in Matrix Form
The Model in Matrix Form
Together with the matrix:
X[n×p]
=
1 x11 . . . x1,p−11 x21 . . . x2,p−1...
.... . .
...1 xn1 . . . xn,p−1
.Write the model in matrix form as follows:
Yn×1
= X[n×p]
β[p×1]
+ ε[n×1]
.
The fitted value is:
Y[n×1]
= X[n×p]
β[p×1]
.
3209/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear models
Multiple Linear regression
Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models
Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters
Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression
AppendixSimple linear regression in matrix form
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear models
Introduction
To apply linear regression properly:
Effects of the covariates (explanatory variables) must beadditive;
Homoskedastic (constant) variance (otherwise useAutoRegressive Conditional Heteroscedasticity model (ARCH)model, from Robert Engle; 2003 Nobel prize for Economics);
Errors must be independent of the explanatory variables withmean zero (weak assumptions);
Errors must be Normally distributed, and hence, symmetric(only in case of testing, i.e., strong assumptions).
3210/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear models
Linear models in general
A linear model involves a response variable datum, yi , treatedas an observation on a random variable, (Yi |X = x), whereE[Yi |X = x ] ≡ µi , the εi ’s are zero mean random variablesindependent of X , and the βi ’s are model parameters, thevalues of which are unknown and need to be estimated usingdata.
The following are examples of linear models:
- Affine form: µi = β0 + xiβ1;- Polynomial (cubic) form: µi = β0 + xiβ1 + x2i β2 + x3i β3;- Affine form with interaction terms:µi = β0 + xiβ1 + ziβ2 + (xizi )β3.
For all linear forms we have: Yi = µi + εi .
3211/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear models
Linear models
The first model can be re-written in matrix-vector form as:µ1µ2µ3...µn
=
1 x11 x21 x3...
...1 xn
︸ ︷︷ ︸
X
[β0β1
]= [1n X ]β.
So model has general form µ = Xβ, i.e., the expected valuevector µ is given by a model matrix (or design matrix), X,multiplied by a parameter vector, β.
All linear models can be written in this general form.
3212/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear models
Linear models
The second model (the cubic) given above can be written inmatrix-vector form as:
µ1µ2µ3...µn
=
1 x1 x21 x311 x2 x22 x321 x3 x23 x33...
......
...1 xn x2n x3n
︸ ︷︷ ︸
X
β0β1β2β3
.
3213/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear models
Models in which data are divided into different groups, eachof which are assumed to have a different mean, are lessobviously of the form µ = Xβ, but they can be written likethis using dummy variables.
Consider the model:
yi = βj + εi if observation i is in group j ,
and suppose there are three groups, each with two data. Thenthe model can be re-written:
y1y2y3y4y5y6
=
1 0 01 0 00 1 00 1 00 0 10 0 1
︸ ︷︷ ︸
X
β0β1β2
+ ε.
3214/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Matrix notation
Linear models
Marginal effects
Assume that we have the multiple regression model of theform:
y = β0 + β1x1 + . . .+ βp−1xp−1 + ε.
Assume that xk is a continuous variable so that if we increaseit by one unit while holding the values of the other variablesfixed, the value of y becomes:
ynew = β0 + β1x1 + . . .+ βk (xk+1) + . . .+ βp−1xp−1 + ε.
Since E [ε] = 0, then the marginal effect of xk is:
βk = E [ynew ]− E [y ] ,
is therefore the expected increase (or decrease) in the value ofy whenever you increase the value of xk by one unit.
3215/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
Statistical Properties of the Least Squares Estimates
Multiple Linear regression
Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models
Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters
Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression
AppendixSimple linear regression in matrix form
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
Statistical Properties of the Least Squares Estimates
Assumptions
The residuals terms εi satisfy the following:
E [εi |X = x] = 0, for i = 1, 2, . . . , n;Var (εi |X = x) = σ2, for i = 1, 2, . . . , n;
Cov (εi , εj |X = x) = 0, for all i 6= j .
In words, the residuals have zero means, common variance,are uncorrelated with explanatory variables and areindependent of other residuals.
In matrix form, we have:
E [ε] = 0;
Cov (ε) = σ2In,
where In is a matrix of size n × n with ones on the diagonaland zeros on the off-diagonal elements.
3216/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
Statistical Properties of the Least Squares Estimates
Statistical Properties of the Least Squares EstimatesThe following properties of the least squares estimates can beverified:
1. The least squares estimates are unbiased: E[β]
= β.
2. The variance-covariance matrix of the least squares estimates
is: Var(β)
= σ2 ·(X>X
)−1.
3. An unbiased estimate of σ2 is:
s2 =1
n − p
(y − y
)> (y − y
).
Note that:
(n − p) · S2
σ2∼ χ2(n − p),
and β and S2 are independent.3217/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
Statistical Properties of the Least Squares Estimates
Statistical Properties of the Least Squares Estimates
4. Each component βk is normally distributed with mean:
E[βk
]= βk ,
and variance:Var
(βk
)= σ2 · ckk ,
where ckk is the (k + 1)th diagonal entry of the matrix
C =(X>X
)−1(because c11 corresponds to the constant) and
covariance between βk and βl :
Cov(βk , βl
)= σ2 · ckl ,
3218/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
Multiple Linear regression
Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models
Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters
Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression
AppendixSimple linear regression in matrix form
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for Individual Regression Parameters
The standard error of βk is estimated using:
se(βk
)= s√ckk .
Under the normality (strong) assumption, we have:
βk − βkse(βk
) ∼ t (n − p) .
A 100 (1− α) % confidence interval for βk is given by:
βk ± t1−α/2,n−p · se(βk
).
3219/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for Individual Regression ParametersIn testing the null hypothesis H0 : βk = βk0 for some fixed constantβk0 , we use the test statistic:
T =βk − βk0se(βk
)which under the null hypothesis, it has a t-distribution with n − pdegrees of freedom. The common test is to test the significance ofthe presence of the variable xk , in which case the test statisticsimply becomes:
T =βk
se(βk
) ,because we test H0 : βk = 0 against H1 : βk 6= 0 when we test forthe significance/importance of the variable.
3220/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for Individual Regression ParametersHowever, we can always have more general tests for the regressioncoefficients as demonstrated in the three cases below:
1. Test the null hypothesis:
H0 : βk = βk0
against the alternative:
H1 : βk 6= βk0 .
Use the decision rule (using generalized LRT, week 7):
Reject H0 if: |T | =
∣∣∣∣∣∣ βk − βk0se(βk
)∣∣∣∣∣∣ > t1−α/2,n−p.
3221/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
CI and Tests for Individual Regression Parameters
CI and Tests for Individual Regression Parameters
2. Test the hypothesis:
H0 : βk = βk0 v.s. H1 : βk > βk0 .
Use the decision rule (using UMP, week 7):
Reject H0 if: T =βk − βk0se(βk
) > t1−α,n−p.
3. Test the hypothesis:
H0 : βk = βk0 v.s. H1 : βk < βk0 .
Use the decision rule (using UMP, week 7):
Reject H0 if: T =βk − βk0se(βk
) < −t1−α,n−p.
3222/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
CI and Tests for functions of Regression Parameters
Multiple Linear regression
Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models
Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters
Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression
AppendixSimple linear regression in matrix form
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
CI and Tests for functions of Regression Parameters
CI and Tests for functions of Regression ParametersLet D be a matrix (size m × p) of m linear combinations of theexplanatory variables.Then we have that:
E[Dβ]
=Dβ
Var(
Dβ)
=DVar(β)
D> = σ2D(X>X)−1D>
Under the normality (strong) assumption, we have:
D(β − β)√s2D(X>X)−1D>︸ ︷︷ ︸
=se(Dβ)
∼ t (n − p) .
A 100 (1− α) % confidence interval for Dβ is given by:
Dβ ± t1−α/2,n−p · se(
Dβ).
3223/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
CI and Tests for functions of Regression Parameters
Adjusted R-SquaredThe coefficient of determination may is:
R2 =SST− SSE
SST= 1− SSE
SST.
In the simple linear regression model, the R-squared provides adescriptive measure of the success of the regressor variables inexplaining the variation in the dependent variable.
The R-squared will always increase when adding additionalregressor variables increase even if regressor variables addeddo not strongly influence the dependent variable.
An alternative is to correct it for the number of regressorvariables present. Thus, we define adjusted R-squared:
R2a = 1− SSE/ (n − p)
SST/ (n − 1)= 1− s2
MST= 1− n − 1
n − p
(1− R2
).
3224/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
CI and Tests for functions of Regression Parameters
Can we test wether the regression explains anythingsignificant? E.g. can we jointly test wether[β1, . . . , βp−1]> = 0 (note: excluding β0)?
Use the F-statistic:
F =|Xβ|2/(p − 1)
|ε|2/(n − p)=
SSM/(p − 1)
SSE/(n − p)∼ Fp−1,n−p.
Under the strong assumptions |Xβ|2/σ2 ∼ χ2p−1 and
|ε|2/σ2 ∼ χ2n−p are chi-squared distributed (note: X is the
matrix X without the constant).
Interpretation: If the regression model explains a largeproportion of the variability in y , then |Xβ|2 should be largeand |ε|2 should be small.
Hence, test H0 : β = 0 v.s. H1 : at least one βk 6= 0.
Reject H0 if F > Fp−1,n−p(1− α).3225/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Statistical Properties of the Least Squares Estimates
CI and Tests for functions of Regression Parameters
ANOVA table and sum of squares:- SST is the total variability in the absence of knowledge of the
variables X1, . . . ,Xp−1;- SSE is the total variability remaining after introducing the
effect of X1, . . . ,Xp−1;- SSM is the total variability “explained” because of knowledge
of X1, . . . ,Xp−1.
This partitioning of the variability is used in ANOVA tables:Source Sum of squares Degrees Mean F p-value
of freedom square
Regression SSM=n∑
i=1(yi − y)2 DFM=p − 1 MSM= SSM
DFMMSMMSE 1−
FDFM,DFE(F )
Error SSE=n∑
i=1(yi − yi )
2 DFE=n − p MSE= SSEDFE
Total SST=n∑
i=1(yi − y)2 DFT=n − 1 MST= SST
DFT
3226/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example regression output
Multiple Linear regression
Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models
Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters
Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression
AppendixSimple linear regression in matrix form
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example regression output
Example regression output (=summary)Error variance and standard deviation
s2: MSE=∑n
i=1 ε2i
n−p CI s2: SSEχ21−α/2(n−p)
SSEχ2α/2
(n−p)
s:√s2 CI s:
√SSE
χ21−α/2(n−p)
√SSE
χ2α/2
(n−p)
ANOVA
Source Sum of squares Degrees Mean F p-valueof freedom square
Regression SSM=n∑
i=1(yi − y)2 DFM=p − 1 MSM= SSM
DFMMSMMSE 1−
FDFM,DFE(F )
Error SSE=n∑
i=1(yi − yi )
2 DFE=n − p MSE= SSEDFE
Total SST=n∑
i=1(yi − y)2 DFT=n − 1 MST= SST
DFT3227/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example regression output
Example regression output (cont.) (=summary)
R2: 1− SSESST R:
√R2
R2a : 1− SSE/(n−p)
SST/(n−1) Ra:√
R2a
Coefficients:
β se(β) t p-value CI(β)(X>X
)−1X>y
√Cov(β)kk
β
se(β)1− tn−p(|t|) β − t1−α/2(n − p) · se(β)
β + t1−α/2(n − p) · se(β)
Covariance matrix:
Cov(β) = s2 ·(X>X
)−13228/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Exercise: Multiple Linear Regression
Multiple Linear regression
Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models
Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters
Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression
AppendixSimple linear regression in matrix form
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Exercise: Multiple Linear Regression
Exercise regression
Given is the following linear regression:
Yi = β0 + β1 · x1i + β2 · x2i + εi
For our sample with 20 observations we have∑20i=1(yi − y)2 = 53.82:
(X>X)−1 =
0.19 −0.08 −0.04−0.08 0.11 −0.03−0.04 −0.03 0.05
β =
0.20.930.95
20∑i=1
ε2i = 11.67
a. Question: What is the estimate of variance of the residual?
b. Question: What is the 95% CI for β1?
c. Question: What is the 95% CI for β1 − β2?
d. Question: Are X1 and X2 jointly significant?3229/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Exercise: Multiple Linear Regression
Exercise regressiona. Solution: s2 =
∑20i=1 ε
2i /(n − p) = 11.67/17 = 0.69.
b. Solution: Var(β1) = s2 · c11 = 0.69 · 0.11 = 0.076⇒se(β1) =
√0.076 = 0.276.
F&T page 163: t0.975(17) = 2.110, thus 95% CI for β1 is;
(β1 − t0.975(17) · se(β1), β1 + t0.975(17) · se(β1)) = (0.35, 1.51)
c. Solution: D = [0 1 − 1]; Var(Dβ) = s2 ·D(X>X)−1 ·D> is:
Var(Dβ) =0.69 · [0 1 − 1] ·
0.19 −0.08 −0.04−0.08 0.11 −0.03−0.04 −0.03 0.05
· 0
1−1
=0.69 · [−0.04 0.14 − 0.08] ·
01−1
= 0.69 · 0.22 = 0.151.
3230/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Exercise: Multiple Linear Regression
Exercise regression
c. Solution (cont.): se(Dβ) =√Var(Dβ) =
√0.151 = 0.389.
F&T page 163: t0.975(17) = 2.110, thus 95% CI for β1−β2 is;
(β1 − β2 − t0.975(17) · se(Dβ), β1 − β2 + t0.975(17) · se(Dβ))
= (−0.84,0.80)
d. Solution: SST=53.82; SSE=11.67; SSM=42.14;
MSM=42.14/2=21.07; MSE=11.67/17=0.687;F=21.07/0.687=30.68.
F0.01(2, 17) = 6.112, thus X1 and X2 are jointly significanteven for α = 0.01.
3231/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Multiple Linear regression
Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models
Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters
Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression
AppendixSimple linear regression in matrix form
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
We use a dataset consisting of salaries of football players and someregressor variables that may influence their salaries:
1. SALARY = “player’s salary”;
2. DRAFT = “the round in which player was originally drafted”;
3. YRSEXP = “the player’s experience in years”;
4. PLAYED = “the number of games played in the previous year”;
3232/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Regressor variables (cont.):
5. STARTED = “the number of games started in the previousyear”;
6. CITYPOP = “the population of the city in which the player isdomiciled”;
7. OFFBACK = “an indicator of player’s position in the game”(takes value 1 = offback defensive, 0 = others), i.e., it is adummy variable.
3233/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Summary Statistics of Variables in the Football Players Salary Data
Count Mean Median Std Dev Minimum Maximum
SALARY 169 336809 265000 255118 75000 1500000
DRAFT 169 6.473 5 4.61 1 13
YRSEXP 169 4.077 4 3.352 0 17
PLAYED 169 10.237 14 6.999 0 16
STARTED 169 5.97 1 6.859 0 16
CITYPOP 169 4980435 2421000 5098109 1176000 18120000
OFFBACK 169 0.2367 0 0.4263 0 1
3234/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
The Correlation Matrix
SALARY DRAFT YRSEXP PLAYED STARTED CITYPOP OFFBACK
SALARY
DRAFT -0.454
YRSEXP 0.345 -0.059
PLAYED 0.212 -0.108 0.646
STARTED 0.440 -0.253 0.557 0.633
CITYPOP 0.077 −0.126 0.129 0.193 0.178OFFBACK 0.179 -0.209 -0.050 -0.043 -0.081 -0.067
3235/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
3236/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
3237/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
ANOVA Table
Source Degree of Sum of Mean F-Ratio Prob(> F)freedom Squares Squares
Regression p− 1 SSM MSM=SSM/p− 1 MSM/MSE p-value
Error n− p SSE MSE=SSE/n− p
Total n− 1 SST MST=SST/n− 1
3238/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
From this ANOVA table, we can derive several statistics that canbe used to summarise the quality of the regression model. Forexample:
- The coefficient of determination is defined by:
R2 =SSM
SST
and has the interpretation that it gives the proportion of thetotal variability that is explained by the regression equation.
3239/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
- The adjusted coefficient of is defined by:
R2a = 1− SSE/ (n − p)
SST/ (n − 1)= 1− s2
S2y
and has the same interpretation as the R-squared, except thatthis is adjusted for the number of regressor variables.
In multiple regression, the R-squared increases as the numberof variables increases, but not necessarily so for adjustedR-squared.
It increases only if an influential variable is added.
3240/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
- The size of a typical error, denoted by s, is the square root ofs2 and is also the square root of the error mean square:
s =√s2 =
√MSE =
√SSE
n − p.
It gives the average deviation of the actual y against thatpredicted by the regression equation.
3241/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
- The F -ratio defined by:
F -ratio =MSM
MSE,
is the test statistic used for model adequacy.
It provides another indication of how good the model is.
Its corresponding p-value should be as small as possible.
3242/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Summary of the results of the regression of the players’ salariesagainst the regressor variables:
Regression Analysis
The regression equation is
SALARY = 361663 - 19139 DRAFT + 21301 YRSEXP - 7948 PLAYED
+ 12965 STARTED - 0.00070 CITYPOP + 82941 OFFBACK
Predictor Coef SE Coef T p
Constant 361663 43734 8.17 0.000
DRAFT -19139 3674 -5.21 0.000
YRSEXP 21301 6370 3.34 0.001
PLAYED -7948 3281 -2.42 0.017
STARTED 12965 3189 4.07 0.000
CITYPOP -0.000699 0.003176 -0.22 0.826
OFFBACK 82941 38241 2.17 0.032
S = 203817 R-sq = 38.5% R-sq(adj) = 36.2%
3243/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
ANOVA Table:
Analysis of Variance
SOURCE DF SS MS F p
Regression 6 4.20463E+12 7.00772E+11 16.87 0.000
Error 162 6.72970E+12 41541379329
Total 168 1.09343E+13
3244/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
3245/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
3246/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Improving the Regression Model
Here we give you summary of the results of the improvedregression model:
Regression Analysis
The regression equation is
LOGSAL = 11.8 + 0.0733 YRSEXP - 0.00981 PLAYED + 0.0264 STARTED
+ 0.000000 CITYPOP + 0.187 OFFBACK + 0.933 1/DRAFT
Predictor Coef SE Coef T p
Constant 11.7509 0.0814 144.42 0.000
YRSEXP 0.07332 0.01471 4.98 0.000
PLAYED -0.009815 0.007607 -1.29 0.199
STARTED 0.026380 0.007596 3.47 0.001
CITYPOP 0.00000001 0.00000001 0.70 0.482
OFFBACK 0.18741 0.08691 2.16 0.033
1/DRAFT 0.9334 0.1242 7.52 0.000
S = 0.4713 R-sq = 54.6% R-sq(adj) = 52.9%
3247/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
New ANOVA Table:
Analysis of Variance
SOURCE DF SS MS F p
Regression 6 43.3145 7.2191 32.50 0.000
Error 162 35.9891 0.2222
Total 168 79.3035
3248/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
3249/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Example: Multiple Linear Regression
Example: Multiple Linear Regression
Example: Multiple Linear Regression
3250/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Appendix
Simple linear regression in matrix form
Multiple Linear regression
Matrix notationLinear Algebra and Matrix ApproachThe Model in Matrix FormLinear models
Statistical Properties of the Least Squares EstimatesStatistical Properties of the Least Squares EstimatesCI and Tests for Individual Regression ParametersCI and Tests for functions of Regression Parameters
Example: Multiple Linear RegressionExample regression outputExercise: Multiple Linear RegressionExample: Multiple Linear Regression
AppendixSimple linear regression in matrix form
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Appendix
Simple linear regression in matrix form
For simple linear regression in matrix form we have:
y =
y1y2...yn
X =
1 x11 x2...
...1 xn
.Hence
X>X =
[n
∑ni=1 xi∑n
i=1 xi∑n
i=1 x2i
]and(
X>X)−1
=1
n ·∑n
i=1 x2i − (
∑ni=1 xi )
2︸ ︷︷ ︸=n·
∑ni=1(xi−x)2
[ ∑ni=1 x
2i −
∑ni=1 xi
−∑n
i=1 xi n
].
3251/3252
ACTL2002/ACTL5101 Probability and Statistics: Week 11
Appendix
Simple linear regression in matrix form
Thus:
X>y =
[ ∑ni=1 xi∑n
i=1 xiyi
].
Hence
β =
[β0β1
]=(
X>X)−1 (
X>y)
=1
n ·∑n
i=1(xi − x)2
[ ∑ni=1 x
2i
∑ni=1 yi −
∑ni=1 xi
∑ni=1 xiyi
n∑n
i=1 xiyi −∑n
i=1 xi∑n
i=1 yi
]
3252/3252