Post on 05-Apr-2018
7/31/2019 Multivariate Laboratory Exercise III
1/15
multivariate laboratory exercise iii 1
Student:
Asaad, Al-Ahmadgaid B.
website:www.alstat.weebly.com
email:alstated@gmail.com
Instructor:Prof. Baguio, Carolina B.
email: carolina.baguio@yahoo.com
A. Obtain a Data with two and three response variable.
1. Data with three response variables
Source: Rencher, A. C. (2002), Methods of Multivariate
Analysis, 2nd Edition. pg. 56
Table A.1 Calcium in Soil and Turnip GreensLocation Number y1 y2 y3
1 35 3.5 2.8
2 35 4.9 2.7
3 40 30 4.38
4 10 2.8 3.21
5 6 2.7 2.73
6 20 2.8 2.8
7 35 4.6 2.88
8 35 10.9 2.99 35 8 3.28
10 30 1.6 3.2
Table A.1 gives partial data from Kramer and Jensen
(1969). Three variables were measured (in milliequiv-
alents per 100 g) at 10 different locations in the South.
The variables are y1 = available soil calcium, y2 = ex-
changeable soil calcium, and y3 = turnip green calcium.
Test the normality of the data using R software.
Solution:
Res3Data
7/31/2019 Multivariate Laboratory Exercise III
2/15
multivariate laboratory exercise iii 2
35 3.5 2.80
35 4.9 2.70
40 30.0 4.38
10 2.8 3.21
6 2.7 2.73
20 2.8 2.81
35 4.6 2.88
35 10.9 2.90
35 8.0 3.28
30 1.6 3.20")
Res3Data.Mat
7/31/2019 Multivariate Laboratory Exercise III
3/15
multivariate laboratory exercise iii 3
great effect on the other observations of the variables y2and y3, and thus this contributes to the formed outliers
on the plot. Now to avoid this, it is better to test
Figure 1: Normal Quantile-Quantile Plot
of the Table A.1
.
the normality and plot the quantile-quantile plot of each
variables. So that, the units of the observations is homo-
geneous.
Testing the Normality of each variable
a. y1=available soil calcium
library(mvnormtest)
attach(Res3Data)
shapiro.test(y1)
Shapiro-Wilk normality test
data: y1
W = 0.7874, p-value = 0.0102
Figure 2: Normal Probability Plot ofvariable y1. The green line is the 95%confidence interval of the data, and thepurple line is the normal line. For thecodes of the plot refer to the appendix.
The p-value of variable y1 is 0.0102 which is less than
the level of significance 0.05, and thus it is not nor-
7/31/2019 Multivariate Laboratory Exercise III
4/15
multivariate laboratory exercise iii 4
mally distributed. Check out the quantile-quantile plot,
Figure 2. Observe that in the normal probability plot
of variable y1, theres a single point that is not inside
of a 95% confidence interval, and thus it is not nor-
mally distributed.
b. y2=exchangeable soil calcium
library(mvnormtest)
attach(Res3Data)
shapiro.test(y2)
Shapiro-Wilk normality test
data: y2
W = 0.6405, p-value = 0.0001687
Figure 3: The normal probability plot ofvariable y2. The green line is the 95%confidence interval of the data, and thepurple line is the normal line. For thecodes of the plot refer to the appendix.
The observe p-value of variable y2 is also less than
0.05, and thus it is not normally distributed. Refer
also to the quantile-quantile plot of variable y2, Figure
3. In the plot, the data is not normally distributed,
because there is an outliers which lie outside the 95%
confidence interval. And thus coincide with the per-
formed test ofy2 variable.
c. y3 = turnip green calcium
library(mvnormtest)
attach(Res3Data)
shapiro.test(y3)
Shapiro-Wilk normality test
data: y3
W = 0.7294, p-value = 0.002001
Figure 4: The normal probability plot ofvariable y3. The green line is the 95%confidence interval of the data, and thepurple line is the normal line. For thecodes of the plot refer to the appendix.
Again the third variable also follows, that the obser-
vations on it is not normally distributed, since again
the 0.002001 is less than the level of significance 0.05.
The quantile-quantile plot of the variable y3 is not nor-
7/31/2019 Multivariate Laboratory Exercise III
5/15
multivariate laboratory exercise iii 5
mally distributed, since another outliers that lie out-
side the 95% confidence interval.
And hence summing up the decisions of the three vari-
ables tested, the decision of the Shapiro-Wilk test which
was first applied for the data combining the three vari-ables is true, that the observations in the data is not nor-
mally distributed.
Since the data is not normally distributed then it is diffi-
cult to estimate the appropriate probability density func-
tion of the data due to the small sample size n.
2. Data with two response variables
Source: Hardle, W., et al. (2007), Multivariate Statistics:
Exercises and Solutions. pg. 336
Table A.2 Sales Data
Sales Price Advert Ass. Hours
1 230 125 200 109
2 181 99 55 107
3 165 97 105 98
4 150 115 85 71
5 97 120 0 82
6 192 100 150 1037 181 80 85 111
8 189 90 120 93
9 172 95 110 86
10 170 125 130 78
This is a data set consisting of 10 measurements of 4
variables. The story: A textile shop manager is study-
ing the sales of "classic blue" pullovers over 10 periods.
He uses three different marketing methods and hopes
to understand his sales as a fit of these variables usingstatistics. The variables measured are X1: Numbers of
sold pullovers, X2: Price (in EUR), X3: Advertisement
costs in local newspapers (in EUR), X4: Presence of a
sales assistant (in hours per period).
7/31/2019 Multivariate Laboratory Exercise III
6/15
multivariate laboratory exercise iii 6
Test the normality of the data using R software.
Solution:
Res2Data
7/31/2019 Multivariate Laboratory Exercise III
7/15
multivariate laboratory exercise iii 7
Figure 5: Normal Quantile-Quantile Plotof the Table A.2.
used. Now just as before, it is better to test the normality
of each variables, to make sure the homogeneity of the
measurements.
Testing the Normality of each variable
a. Sales - products sold
library(mvnormtest)
attach(Res2Data)
shapiro.test(Sales)
Shapiro-Wilk normality test
data: Sales
W = 0.9067, p-value = 0.2591
Figure 6: The normal probability plot ofvariable Sales. The green line is the 95%confidence interval of the data, and thepurple line is the normal line. For thecodes of the plot refer to the appendix.
The p-value generated is greater than the level of sig-
nificance 0.05, and thus the observations on variable
Sales is normally distributed. Furthermore, the quan-
tile quantile plot of it at Figure 6 is also normally dis-
tributed since all of the points are fluctuated within
the 95% confidence interval.
7/31/2019 Multivariate Laboratory Exercise III
8/15
multivariate laboratory exercise iii 8
b. Price - Price of the products sold
library(mvnormtest)
attach(Res2Data)
shapiro.test(Price)
Shapiro-Wilk normality test
data: Price
W = 0.9187, p-value = 0.346
Figure 7: The normal probability plot ofvariable Price. The green line is the 95%confidence interval of the data, and thepurple line is the normal line. For thecodes of the plot refer to the appendix.
For this variable the p-value of it is also greater than
the level of significance 0.05 which means that the as-
sumption in the null H0 is true, that the data is nor-
mally distributed. And as seen on Figure 7, the points
are within the 95% confidence interval, implying thatthe observations in variable Price ofSales data is nor-
mally distributed.
Thus, summing up the conclusions of the two variables
above (Sales, and Price). The Sales data is normally
distributed which coincides with the performed test of
the Shapiro-Wilk test for multivariate data, in which the
two variables were combined and tested.
If Normal, what is the probability density function of
the data?
Let the variable Sales be S, and Price be P.
If the data set Res2Data is X, then
f(X) =1
22
(||) 12e
12 [(SS)(PP)] 2S SP
PS 2P
1 (S S)
(P P)
(1)
Note that the is just equal to 2S SP
PS 2
P.
And these are the values of each matrix above,
a.
2S SP
PS 2P
=
1152.46 88.9188.91 244.27
7/31/2019 Multivariate Laboratory Exercise III
9/15
multivariate laboratory exercise iii 9
b.
2S SP
PS 2P
1=
0.00089 0.00032
0.00032 0.0042
c. (||) 12 = 2S SP
PS
2
P
12
= 523.0691
d. [(S S)(P P)] = The output of the data gener-ates a large matrix which is not easy to input it here.
However, the following codes will generate it.
attach(Res2Data)
FVar
7/31/2019 Multivariate Laboratory Exercise III
10/15
multivariate laboratory exercise iii 10
Figure 8: 2D Binned Kernel Density Es-timate, with bandwidth of (5,5).
contour(est$x1,est$x2,est$fhat,col = "blue")
Figure 9: Contour plot of2D Binned Ker-nel Density Estimate, with bandwidth of(5,5).
b. Bandwidth = (10,10)
library(KernSmooth)
est
7/31/2019 Multivariate Laboratory Exercise III
11/15
multivariate laboratory exercise iii 11
theta=45,phi=90,col="red",shade = 0.1)
Figure 10: 2D Binned Kernel Density Es-timate, with bandwidth of (10,10).
contour(est$x1,est$x2,est$fhat,col = "red")
Figure 11: Contour plot of 2D BinnedKernel Density Estimate, with band-width of (10,10). This is a contour plotof Figure 10.
7/31/2019 Multivariate Laboratory Exercise III
12/15
multivariate laboratory exercise iii 12
c. Bandwidth = (15,15)
library(KernSmooth)
est
7/31/2019 Multivariate Laboratory Exercise III
13/15
multivariate laboratory exercise iii 13
It is observed that the three-dimensional plot of the data
is not very smooth in the first plot with bandwidth of
(5,5), but with the following plot of bandwidth (10,10) it
became a little smooth. And the third bandwidth makes
it more smoother than the two, but still there are two cir-
cles seen on the plot that makes it not perfectly smooth.
However, a further increase in the bandwidth, the plot
will form a smooth normal plot. Like using the band-
width (45,45) below, it forms a smoothness over the mesh
induced by the grid points.
est
7/31/2019 Multivariate Laboratory Exercise III
14/15
multivariate laboratory exercise iii 14
Appendix
A. R Codes for figures 2, 3, 4, 6, and 7. By using "Variable"
as the place value, the codes can be modified for differ-
ent variables of the data set. Now, since there are two
data sets (three and two response variables) the place
value for the data sets can be "DataSet". And thus,
when using the two response data simply replace the
"DataSet" with "Res2Data". For using the Sales vari-
able of the Sales data, simply replace the "Variable"
with Sales and with the place value of data set re-
placed with "Res2Data".
library(ggplot2)
attach(DataSet)
df
7/31/2019 Multivariate Laboratory Exercise III
15/15
multivariate laboratory exercise iii 15
+ 2*qprobs*fd$vcov[1,2]
#lower bound
xpl