Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with...
Transcript of Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with...
https://www.sampleassignment.com/
Data Analysis
Assessment Item 2 Research Report
Factors Affecting Exam Performance in Data Analysis
https://www.sampleassignment.com/
Contents
Task 1 (t-tests)............................................................................................................................................................................................. 3
Task 2 (Regression) .................................................................................................................................................................................. 11
Task 3 (Further Analysis) ......................................................................................................................................................................... 18
https://www.sampleassignment.com/
Task 1 (t-tests)
1. The following table is indicative of the different components of descriptive statistics to gain an insight into the distribution of
marks obtained by students. The mean value of final marks obtained can be seen as 29.25 while amongst the three measures of
central tendency, that is mean, median and mode, variation is observed which suggests towards the nature of data being far
from normal. This fact is supported by higher value of sample variance which shows the measure of variability of the data
from the central point.
Table 1. Descriptive Statistics
Mean 29.25737179
Standard Error 0.499580343
Median 27
Mode 25.5
Standard Deviation 12.47951298
Sample Variance 155.7382441
Kurtosis -
0.547977466
Skewness 0.41893052
Range 54
Minimum 5
Maximum 59
Sum 18256.6
https://www.sampleassignment.com/
Count 624
Confidence
Level(95.0%)
0.98106543
Figure 1. Final marks distribution
2. Hypothesis testing for determining if the average final mark has decreased in 2015 in comparison to the marks obtained in the
year 2014.
https://www.sampleassignment.com/
Step 1. Stating the null hypothesis
Null hypothesis: Average exam mark in 2015 has not decreased that is, µ = 27.6
Step 2. Stating the alternative hypothesis
Alternatives hypothesis: Average exam mark in 2015 has decreased from 2014 that is, µ < 27.6
Step 3. Setting the level of significance
The level of confidence specified for present case of hypothesis testing is 0.01 and it is denoted by α which means that the
probability of rejecting the null hypothesis provided it is true. Therefore, the confidence level is 99%. The lower value of
significance also stands for the divergence of data from the null hypothesis in order to hold significance.
Step 4. Calculation of test statistic
The test statistic corresponding to single sample t-test is mentioned as under
t= x- µ/(s2/n) where, x is the sample mean and µ is hypothesized value of mean, s is the sample variance while, n denotes to
the total sample size.
t = 29.25-27.6/(155.738/624)
= 3.302
Step 5. Accepting or rejecting he null hypothesis
The calculated value of t-statistic is compared with the tabulated value of t-statistic at 1% level of significance and 623 degrees
of freedom. At 1% significance level, critical value from t-table for more than 600 degrees of freedom for one-tailed test is
2.333 which is lesser that the calculated value of t hence, null hypothesis is rejected.
https://www.sampleassignment.com/
Step 6. Drawing a conclusion
In reference to the rejection of null hypothesis undertaken in previous step, it is concluded that average exam mark in 2015 has
decreased from 2014 that is, µ < 27.6 hence, average mark in 2015 has decreased from that in 2014.
3. (a) To test if there is a difference in the average exam performance between male and female students hypothesis development
is provided as under:
H0: There is no statistical significant difference between the exam performance of male and female students.
H1: There is statistical significant difference between the exam performance of male and female students
A two -sample t-test with equal variances assumed for both the samples of males and females, is performed with the help of
MS Excel and the output table is provided as under:
Table 2. t-Test: Two-Sample Assuming Equal Variances
Variable 1 Variable 2
Mean 28.34294872 30.17179487
Variance 163.5556414 146.7438964
Observations 312 312
Pooled Variance 155.1497689
Hypothesized Mean Difference 0
df 622
https://www.sampleassignment.com/
t Stat
-
1.833850407
P(T<=t) one-tail 0.033576905
t Critical one-tail 1.647307092
P(T<=t) two-tail 0.067153811
t Critical two-tail 1.963785232
In reference to above displayed output table, value of t-stat is compared with t critical two tail which is, -1.83<1.96 which
implies acceptance of null hypothesis at 5% level of significance. It is therefore, concluded that there is no statistical significant
difference between exam performance of males and females students.
(b) The hypothesis development for testing any gender difference for single degree students is specified as under:
H0: There is no statistical significant difference between the exam performance of male and female students with single degree
H1: There is statistical significant difference between the exam performance of male and female students with single degree
The following output table states the value of t-statistic as 0.23 for 407 single degree holders out of a group of 624 students.
On comparing the value of t-statistic with t-critical two tailed at 404 degrees of freedom and 5% level of significance, it is found that
calculated value of t is less than tabulated value hence, null hypothesis is accepted.
Table 3. t-Test: Two-Sample Assuming Equal Variances
https://www.sampleassignment.com/
Variable 1 Variable 2
Mean 27.8852459 27.5941704
Variance 125.1598361 168.500101
Observations 183 223
Pooled Variance 148.9755262
Hypothesized Mean Difference 0
df 404
t Stat 0.239090954
P(T<=t) one-tail 0.405578149
t Critical one-tail 1.648634049
P(T<=t) two-tail 0.811156297
t Critical two-tail 1.965853275
In relation to the output displayed above, the conclusion can be drawn that exam performance of students with single degree
does not depend on the gender of students.
(c) In present case, double degree students are tested for any possible difference in exam performance based on their gender. The
hypothesis for testing this proposition, is provided as under:
H0: There is no statistical significant difference between the exam performance of male and female students with double degree
H1: There is statistical significant difference between the exam performance of male and female students with double degree
https://www.sampleassignment.com/
The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented as under:
As observed from the following table, 129 out of 624 students are pursuing double degrees and the value of t-stat is 1.86 which is less
than t-tabulated for two-tailed test which is 1.971 and thus, suggests towards acceptance of H0. Therefore, no statistically significant
difference in exam performance of males and females for second degree is observed.
Table 4. t-Test: Two-Sample Assuming Equal
Variances
Variable 1 Variable 2
Mean 33.41550388 30.21910112
Variance 160.5014765 147.9599719
Observations 129 89
Pooled Variance 155.3919746
Hypothesized Mean Difference 0
df 216
t Stat 1.860839066
P(T<=t) one-tail 0.032062901
t Critical one-tail 1.651938651
P(T<=t) two-tail 0.064125801
t Critical two-tail 1.971007472
https://www.sampleassignment.com/
(d) The following pie chart in figure 2. shows the composition of males and females in the sample which is an equal percentage for
both the genders. As calculated in part (a), no significant different is found in both the categories of gender in relation to exam
performances. On the other hand, when a similar investigation was carried out for single degree students, their academic performance
did not reveal any variation. A similar analysis was performed with double degree students as well, whereby it was found that males
and females pursuing double degree do not have differing academic results reflecting any significance. Therefore, in sum total it can
be concluded that there exists no significant difference between academic performance of students measured across the categories of
gender, single degree and double degree.
Figure 2. Pie Chart: males and females
50% 50%
percent
males
females
https://www.sampleassignment.com/
Task 2 (Regression)
(a) and (b) The regression output for each of the four cases is presented hereunder:
Step 1: Gender only
The regression model is expressed as – marks = 26.51+1.2899 *( gender)+
The coefficient of determination for the current model is very low that is only 0.005 which implies that the model is
accountable for explaining only 0.5% variation in the dependent variable of marks. In absence of the qualitative variable of
‘’gender’’ marks of student will be 26.51.
Table 5. SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.073333
R Square 0.005378
Adjusted R
Square 0.003779
Standard
Error 12.45591
Observations 624
https://www.sampleassignment.com/
ANOVA
df SS MS F
Significance
F
Regression 1 521.7698 521.7698 3.363007 0.067154
Residual 622 96503.16 155.1498
Total 623 97024.93
Coefficients
Standard
Error t Stat P-value Lower 95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 26.5141 1.576824 16.81488 1.4E-52 23.41756 29.61065 23.41756 29.61065
X Variable 1 1.828846 0.997271 1.83385 0.067154 -0.12958 3.787273 -0.12958 3.787273
Step 2: Gender and Degree type
The regression model is expressed as – marks = 21.65+1.8288 *( gender)+4.203*(degree type)
In relation to the above displayed regression model, a greater level of influence is seen from the end of degree type and the
corresponding p-values for gender and degree type reflect that gender type is of of statistical significance while for gender, the p-value
of 0.019 is greater than 0.05 and hence, is of no statistical significance. As far as model significance is concerned only 3% goodness of
fit is observed which si greater than that of the model constructed with only ‘’’gender’’ as the dependent variable.
https://www.sampleassignment.com/
Table 6. SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.175325
R Square 0.030739
Adjusted R
Square 0.027617
Standard
Error 12.30598
Observations 624
ANOVA
df SS MS F
Significance
F
Regression 2 2982.444 1491.222 9.847132 6.16E-05
Residual 621 94042.48 151.4372
Total 623 97024.93
Coefficients
Standard
Error t Stat P-value Lower 95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 21.65068 1.970418 10.98786 8.69E-26 17.78119 25.52017 17.78119 25.52017
https://www.sampleassignment.com/
X Variable 1 1.289963 0.994295 1.297364 0.194988 -0.66263 3.242551 -0.66263 3.242551
X Variable 2 4.203291 1.042746 4.030981 6.24E-05 2.155555 6.251027 2.155555 6.251027
Step 3: Gender, Degree type and country of citizenship
The regression model is expressed as – marks = 17.91+1.3857 *(gender)+4.939*(degree type)+1.9784*(country of citizenship)
Based on the model summary, an increase in value of coefficient of determination from previous model is observed however,
the increase is insignificant. Apart from gender, degree type and the third variable of country of citizenship are of statistical
significance to the model.
Table 7. SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.197501
R Square 0.039006
Adjusted R
Square 0.034357
https://www.sampleassignment.com/
Standard
Error 12.26326
Observations 624
ANOVA
df SS MS F
Significance
F
Regression 3 3784.6 1261.533 8.388545 1.79E-05
Residual 620 93240.33 150.3876
Total 623 97024.93
Coefficients
Standard
Error t Stat P-value Lower 95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 17.91959 2.542744 7.047341 4.87E-12 12.92615 22.91302 12.92615 22.91302
X Variable 1 1.385763 0.991712 1.397345 0.16281 -0.56176 3.333284 -0.56176 3.333284
X Variable 2 4.939858 1.086967 4.544625 6.62E-06 2.805275 7.074442 2.805275 7.074442
X Variable 3 1.978415 0.856631 2.309529 0.021242 0.296164 3.660666 0.296164 3.660666
Step 4: Gender, degree type, country of citizenship and lecture attendance
https://www.sampleassignment.com/
The regression model comprising of multiple factors namely, gender, degree type, country of citizenship and lecture
attendance is mentioned as under:
marks = 13.588+1.100 *(gender)+4.6055*(degree type)+2.009*(country of citizenship)+0.7326*(lecture attendance)
The above-mentioned model states that in absence of any of the factors, the marks obtained will be equal to 13.588 while,
keeping all other factors constant, a unit increase in lecture attendance is likely to increase the marks obtained by student to 0.73 unit.
On comparing the p-value of lecture attendance with 0.05, it is found that lecture attendance plays a significant role in predicting he
marks of a student.
Table 8. SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.25721
R Square 0.066157
Adjusted R
Square 0.060123
Standard
Error 12.09855
Observations 624
https://www.sampleassignment.com/
ANOVA
df SS MS F
Significance
F
Regression 4 6418.897 1604.724 10.96311 1.35E-08
Residual 619 90606.03 146.3748
Total 623 97024.93
Coefficients
Standard
Error t Stat P-value Lower 95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 13.58858 2.708375 5.017244 6.86E-07 8.269861 18.9073 8.269861 18.9073
X Variable 1 1.100492 0.980699 1.12215 0.262234 -0.82541 3.026393 -0.82541 3.026393
X Variable 2 4.60556 1.075259 4.283211 2.14E-05 2.493963 6.717158 2.493963 6.717158
X Variable 3 2.009265 0.845157 2.377388 0.017739 0.349544 3.668987 0.349544 3.668987
X Variable 4 0.732638 0.172699 4.242278 2.55E-05 0.393491 1.071785 0.393491 1.071785
(c) The adequacy of the overall model comprising of all four factors of gender, degree type, country of citizenship and lecture
attendance has the highest value of coefficient of determination which implies 6% accountability for explaining the
variance in marks obtained by students. The suitability of the regression model is seen to be exceeding with inclusion of
another factor at each step. Initial step of model when only gender was included in the model, no major influence from the
end of independent variable was observed and throughout, the step-wise regression models designed, gender does not
https://www.sampleassignment.com/
reveal any significant difference in academic performance of students. Therefore, it is concluded that all four factors
increase the prediction significance of the multiple regression model designed to provide information about the dependent
variable of student marks.
Task 3 (Further Analysis)
A separate level regression analysis is carried out for single and double degree students on the base of suspicion that there
might be an interaction of the variable of degree type with the rest of the three explanatory variables. The following table shows the
regression output when only single degree students are considered and it is found that the coefficient of determination shows 25%
accountability for explaining the variance in the dependent variable. The regression model is developed as under:
Marks = 13.588+1.100*(gender)+4.6055*(degree type)+2.009*(country of citizenship)+0.7326*(lecture attendance)
TABLE 9.
SUMMARY
OUTPUT
REGRESSION STATISTICS
R SQUARE 0.257210409
ADJUSTED R
SQUARE
0.066157194
REGRESSION
STATISTICS
0.060122669
https://www.sampleassignment.com/
MULTIPLE R 12.09854724
STANDARD
ERROR
df SS MS F Significance
F
OBSERVATIONS 4 6418.897 1604.724 10.96311 1.35E-08
619 90606.03 146.3748
ANOVA 623 97024.93
REGRESSION Coefficients Standard
Error
t Stat P-value Lower 95% Upper
95%
Lower
95.0%
Upper
95.0%
RESIDUAL 13.58857866 2.708375 5.017244 6.86E-07 8.269861 18.9073 8.269861 18.9073
X VARIABLE 1 1.100492252 0.980699 1.12215 0.262234 -0.82541 3.026393 -0.82541 3.026393
X VARIABLE 2 4.605560401 1.075259 4.283211 2.14E-05 2.493963 6.717158 2.493963 6.717158
X VARIABLE 3 2.009265361 0.845157 2.377388 0.017739 0.349544 3.668987 0.349544 3.668987
X VARIABLE 4 0.73263803 0.172699 4.242278 2.55E-05 0.393491 1.071785 0.393491 1.071785
The regression model for students with double degree is presented as under:
Marks = 14.267+0.911*(gender)+4.644*(degree type)+1.626*(country of citizenship)+0.725*(lecture attendance)
https://www.sampleassignment.com/
The value of coefficient of determination is 0.06 which shows that the model is capable of explaining only 6% variation in the student
marks with this set of explanatory variables.
Table 10.
SUMMARY
OUTPUT
Regression Statistics
Multiple R 0.256808
R Square 0.06595
Adjusted R
Square 0.059596
Standard Error 12.09717
Observations 593
ANOVA
df SS MS F
Significance
F
Regression 4 6075.635 1518.909 10.3792 3.96E-08
Residual 588 86048.85 146.3416
Total 592 92124.49
https://www.sampleassignment.com/
Coefficients
Standard
Error t Stat P-value Lower 95%
Upper
95%
Lower
95.0%
Upper
95.0%
Intercept 14.2679 2.777546 5.136871 3.81E-07 8.812777 19.72301 8.812777 19.72301
X Variable 1 0.911667 1.006456 0.90582 0.365402 -1.06502 2.888353 -1.06502 2.888353
X Variable 2 4.644415 1.078483 4.306434 1.94E-05 2.526268 6.762563 2.526268 6.762563
X Variable 3 1.626817 0.928508 1.752076 0.080282 -0.19678 3.450413 -0.19678 3.450413
X Variable 4 0.725954 0.17593 4.126387 4.22E-05 0.380427 1.071481 0.380427 1.071481
From the above conducted multiple regression analysis, it is concluded that for single degree students, model has a better
predictive capacity. In terms of gender, no significance has been observed for the factor to influence the student marks. Lecture
attendance and degree type has a greater impact on the marks of students. The best multiple regression model in generic sense is the
one encompassing all the four factors while, when differentiated on the basis of degree type, students with single degree have a better
regression model developed for predicting heir marks in comparison to that of double degree students.