Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with...

21
https://www.sampleassignment.com/ Data Analysis Assessment Item 2 Research Report Factors Affecting Exam Performance in Data Analysis

Transcript of Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with...

Page 1: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Data Analysis

Assessment Item 2 Research Report

Factors Affecting Exam Performance in Data Analysis

Page 2: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Contents

Task 1 (t-tests)............................................................................................................................................................................................. 3

Task 2 (Regression) .................................................................................................................................................................................. 11

Task 3 (Further Analysis) ......................................................................................................................................................................... 18

Page 3: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Task 1 (t-tests)

1. The following table is indicative of the different components of descriptive statistics to gain an insight into the distribution of

marks obtained by students. The mean value of final marks obtained can be seen as 29.25 while amongst the three measures of

central tendency, that is mean, median and mode, variation is observed which suggests towards the nature of data being far

from normal. This fact is supported by higher value of sample variance which shows the measure of variability of the data

from the central point.

Table 1. Descriptive Statistics

Mean 29.25737179

Standard Error 0.499580343

Median 27

Mode 25.5

Standard Deviation 12.47951298

Sample Variance 155.7382441

Kurtosis -

0.547977466

Skewness 0.41893052

Range 54

Minimum 5

Maximum 59

Sum 18256.6

Page 4: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Count 624

Confidence

Level(95.0%)

0.98106543

Figure 1. Final marks distribution

2. Hypothesis testing for determining if the average final mark has decreased in 2015 in comparison to the marks obtained in the

year 2014.

Page 5: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Step 1. Stating the null hypothesis

Null hypothesis: Average exam mark in 2015 has not decreased that is, µ = 27.6

Step 2. Stating the alternative hypothesis

Alternatives hypothesis: Average exam mark in 2015 has decreased from 2014 that is, µ < 27.6

Step 3. Setting the level of significance

The level of confidence specified for present case of hypothesis testing is 0.01 and it is denoted by α which means that the

probability of rejecting the null hypothesis provided it is true. Therefore, the confidence level is 99%. The lower value of

significance also stands for the divergence of data from the null hypothesis in order to hold significance.

Step 4. Calculation of test statistic

The test statistic corresponding to single sample t-test is mentioned as under

t= x- µ/(s2/n) where, x is the sample mean and µ is hypothesized value of mean, s is the sample variance while, n denotes to

the total sample size.

t = 29.25-27.6/(155.738/624)

= 3.302

Step 5. Accepting or rejecting he null hypothesis

The calculated value of t-statistic is compared with the tabulated value of t-statistic at 1% level of significance and 623 degrees

of freedom. At 1% significance level, critical value from t-table for more than 600 degrees of freedom for one-tailed test is

2.333 which is lesser that the calculated value of t hence, null hypothesis is rejected.

Page 6: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Step 6. Drawing a conclusion

In reference to the rejection of null hypothesis undertaken in previous step, it is concluded that average exam mark in 2015 has

decreased from 2014 that is, µ < 27.6 hence, average mark in 2015 has decreased from that in 2014.

3. (a) To test if there is a difference in the average exam performance between male and female students hypothesis development

is provided as under:

H0: There is no statistical significant difference between the exam performance of male and female students.

H1: There is statistical significant difference between the exam performance of male and female students

A two -sample t-test with equal variances assumed for both the samples of males and females, is performed with the help of

MS Excel and the output table is provided as under:

Table 2. t-Test: Two-Sample Assuming Equal Variances

Variable 1 Variable 2

Mean 28.34294872 30.17179487

Variance 163.5556414 146.7438964

Observations 312 312

Pooled Variance 155.1497689

Hypothesized Mean Difference 0

df 622

Page 7: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

t Stat

-

1.833850407

P(T<=t) one-tail 0.033576905

t Critical one-tail 1.647307092

P(T<=t) two-tail 0.067153811

t Critical two-tail 1.963785232

In reference to above displayed output table, value of t-stat is compared with t critical two tail which is, -1.83<1.96 which

implies acceptance of null hypothesis at 5% level of significance. It is therefore, concluded that there is no statistical significant

difference between exam performance of males and females students.

(b) The hypothesis development for testing any gender difference for single degree students is specified as under:

H0: There is no statistical significant difference between the exam performance of male and female students with single degree

H1: There is statistical significant difference between the exam performance of male and female students with single degree

The following output table states the value of t-statistic as 0.23 for 407 single degree holders out of a group of 624 students.

On comparing the value of t-statistic with t-critical two tailed at 404 degrees of freedom and 5% level of significance, it is found that

calculated value of t is less than tabulated value hence, null hypothesis is accepted.

Table 3. t-Test: Two-Sample Assuming Equal Variances

Page 8: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Variable 1 Variable 2

Mean 27.8852459 27.5941704

Variance 125.1598361 168.500101

Observations 183 223

Pooled Variance 148.9755262

Hypothesized Mean Difference 0

df 404

t Stat 0.239090954

P(T<=t) one-tail 0.405578149

t Critical one-tail 1.648634049

P(T<=t) two-tail 0.811156297

t Critical two-tail 1.965853275

In relation to the output displayed above, the conclusion can be drawn that exam performance of students with single degree

does not depend on the gender of students.

(c) In present case, double degree students are tested for any possible difference in exam performance based on their gender. The

hypothesis for testing this proposition, is provided as under:

H0: There is no statistical significant difference between the exam performance of male and female students with double degree

H1: There is statistical significant difference between the exam performance of male and female students with double degree

Page 9: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented as under:

As observed from the following table, 129 out of 624 students are pursuing double degrees and the value of t-stat is 1.86 which is less

than t-tabulated for two-tailed test which is 1.971 and thus, suggests towards acceptance of H0. Therefore, no statistically significant

difference in exam performance of males and females for second degree is observed.

Table 4. t-Test: Two-Sample Assuming Equal

Variances

Variable 1 Variable 2

Mean 33.41550388 30.21910112

Variance 160.5014765 147.9599719

Observations 129 89

Pooled Variance 155.3919746

Hypothesized Mean Difference 0

df 216

t Stat 1.860839066

P(T<=t) one-tail 0.032062901

t Critical one-tail 1.651938651

P(T<=t) two-tail 0.064125801

t Critical two-tail 1.971007472

Page 10: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

(d) The following pie chart in figure 2. shows the composition of males and females in the sample which is an equal percentage for

both the genders. As calculated in part (a), no significant different is found in both the categories of gender in relation to exam

performances. On the other hand, when a similar investigation was carried out for single degree students, their academic performance

did not reveal any variation. A similar analysis was performed with double degree students as well, whereby it was found that males

and females pursuing double degree do not have differing academic results reflecting any significance. Therefore, in sum total it can

be concluded that there exists no significant difference between academic performance of students measured across the categories of

gender, single degree and double degree.

Figure 2. Pie Chart: males and females

50% 50%

percent

males

females

Page 11: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Task 2 (Regression)

(a) and (b) The regression output for each of the four cases is presented hereunder:

Step 1: Gender only

The regression model is expressed as – marks = 26.51+1.2899 *( gender)+

The coefficient of determination for the current model is very low that is only 0.005 which implies that the model is

accountable for explaining only 0.5% variation in the dependent variable of marks. In absence of the qualitative variable of

‘’gender’’ marks of student will be 26.51.

Table 5. SUMMARY

OUTPUT

Regression Statistics

Multiple R 0.073333

R Square 0.005378

Adjusted R

Square 0.003779

Standard

Error 12.45591

Observations 624

Page 12: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

ANOVA

df SS MS F

Significance

F

Regression 1 521.7698 521.7698 3.363007 0.067154

Residual 622 96503.16 155.1498

Total 623 97024.93

Coefficients

Standard

Error t Stat P-value Lower 95%

Upper

95%

Lower

95.0%

Upper

95.0%

Intercept 26.5141 1.576824 16.81488 1.4E-52 23.41756 29.61065 23.41756 29.61065

X Variable 1 1.828846 0.997271 1.83385 0.067154 -0.12958 3.787273 -0.12958 3.787273

Step 2: Gender and Degree type

The regression model is expressed as – marks = 21.65+1.8288 *( gender)+4.203*(degree type)

In relation to the above displayed regression model, a greater level of influence is seen from the end of degree type and the

corresponding p-values for gender and degree type reflect that gender type is of of statistical significance while for gender, the p-value

of 0.019 is greater than 0.05 and hence, is of no statistical significance. As far as model significance is concerned only 3% goodness of

fit is observed which si greater than that of the model constructed with only ‘’’gender’’ as the dependent variable.

Page 13: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Table 6. SUMMARY

OUTPUT

Regression Statistics

Multiple R 0.175325

R Square 0.030739

Adjusted R

Square 0.027617

Standard

Error 12.30598

Observations 624

ANOVA

df SS MS F

Significance

F

Regression 2 2982.444 1491.222 9.847132 6.16E-05

Residual 621 94042.48 151.4372

Total 623 97024.93

Coefficients

Standard

Error t Stat P-value Lower 95%

Upper

95%

Lower

95.0%

Upper

95.0%

Intercept 21.65068 1.970418 10.98786 8.69E-26 17.78119 25.52017 17.78119 25.52017

Page 14: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

X Variable 1 1.289963 0.994295 1.297364 0.194988 -0.66263 3.242551 -0.66263 3.242551

X Variable 2 4.203291 1.042746 4.030981 6.24E-05 2.155555 6.251027 2.155555 6.251027

Step 3: Gender, Degree type and country of citizenship

The regression model is expressed as – marks = 17.91+1.3857 *(gender)+4.939*(degree type)+1.9784*(country of citizenship)

Based on the model summary, an increase in value of coefficient of determination from previous model is observed however,

the increase is insignificant. Apart from gender, degree type and the third variable of country of citizenship are of statistical

significance to the model.

Table 7. SUMMARY

OUTPUT

Regression Statistics

Multiple R 0.197501

R Square 0.039006

Adjusted R

Square 0.034357

Page 15: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Standard

Error 12.26326

Observations 624

ANOVA

df SS MS F

Significance

F

Regression 3 3784.6 1261.533 8.388545 1.79E-05

Residual 620 93240.33 150.3876

Total 623 97024.93

Coefficients

Standard

Error t Stat P-value Lower 95%

Upper

95%

Lower

95.0%

Upper

95.0%

Intercept 17.91959 2.542744 7.047341 4.87E-12 12.92615 22.91302 12.92615 22.91302

X Variable 1 1.385763 0.991712 1.397345 0.16281 -0.56176 3.333284 -0.56176 3.333284

X Variable 2 4.939858 1.086967 4.544625 6.62E-06 2.805275 7.074442 2.805275 7.074442

X Variable 3 1.978415 0.856631 2.309529 0.021242 0.296164 3.660666 0.296164 3.660666

Step 4: Gender, degree type, country of citizenship and lecture attendance

Page 16: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

The regression model comprising of multiple factors namely, gender, degree type, country of citizenship and lecture

attendance is mentioned as under:

marks = 13.588+1.100 *(gender)+4.6055*(degree type)+2.009*(country of citizenship)+0.7326*(lecture attendance)

The above-mentioned model states that in absence of any of the factors, the marks obtained will be equal to 13.588 while,

keeping all other factors constant, a unit increase in lecture attendance is likely to increase the marks obtained by student to 0.73 unit.

On comparing the p-value of lecture attendance with 0.05, it is found that lecture attendance plays a significant role in predicting he

marks of a student.

Table 8. SUMMARY

OUTPUT

Regression Statistics

Multiple R 0.25721

R Square 0.066157

Adjusted R

Square 0.060123

Standard

Error 12.09855

Observations 624

Page 17: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

ANOVA

df SS MS F

Significance

F

Regression 4 6418.897 1604.724 10.96311 1.35E-08

Residual 619 90606.03 146.3748

Total 623 97024.93

Coefficients

Standard

Error t Stat P-value Lower 95%

Upper

95%

Lower

95.0%

Upper

95.0%

Intercept 13.58858 2.708375 5.017244 6.86E-07 8.269861 18.9073 8.269861 18.9073

X Variable 1 1.100492 0.980699 1.12215 0.262234 -0.82541 3.026393 -0.82541 3.026393

X Variable 2 4.60556 1.075259 4.283211 2.14E-05 2.493963 6.717158 2.493963 6.717158

X Variable 3 2.009265 0.845157 2.377388 0.017739 0.349544 3.668987 0.349544 3.668987

X Variable 4 0.732638 0.172699 4.242278 2.55E-05 0.393491 1.071785 0.393491 1.071785

(c) The adequacy of the overall model comprising of all four factors of gender, degree type, country of citizenship and lecture

attendance has the highest value of coefficient of determination which implies 6% accountability for explaining the

variance in marks obtained by students. The suitability of the regression model is seen to be exceeding with inclusion of

another factor at each step. Initial step of model when only gender was included in the model, no major influence from the

end of independent variable was observed and throughout, the step-wise regression models designed, gender does not

Page 18: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

reveal any significant difference in academic performance of students. Therefore, it is concluded that all four factors

increase the prediction significance of the multiple regression model designed to provide information about the dependent

variable of student marks.

Task 3 (Further Analysis)

A separate level regression analysis is carried out for single and double degree students on the base of suspicion that there

might be an interaction of the variable of degree type with the rest of the three explanatory variables. The following table shows the

regression output when only single degree students are considered and it is found that the coefficient of determination shows 25%

accountability for explaining the variance in the dependent variable. The regression model is developed as under:

Marks = 13.588+1.100*(gender)+4.6055*(degree type)+2.009*(country of citizenship)+0.7326*(lecture attendance)

TABLE 9.

SUMMARY

OUTPUT

REGRESSION STATISTICS

R SQUARE 0.257210409

ADJUSTED R

SQUARE

0.066157194

REGRESSION

STATISTICS

0.060122669

Page 19: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

MULTIPLE R 12.09854724

STANDARD

ERROR

df SS MS F Significance

F

OBSERVATIONS 4 6418.897 1604.724 10.96311 1.35E-08

619 90606.03 146.3748

ANOVA 623 97024.93

REGRESSION Coefficients Standard

Error

t Stat P-value Lower 95% Upper

95%

Lower

95.0%

Upper

95.0%

RESIDUAL 13.58857866 2.708375 5.017244 6.86E-07 8.269861 18.9073 8.269861 18.9073

X VARIABLE 1 1.100492252 0.980699 1.12215 0.262234 -0.82541 3.026393 -0.82541 3.026393

X VARIABLE 2 4.605560401 1.075259 4.283211 2.14E-05 2.493963 6.717158 2.493963 6.717158

X VARIABLE 3 2.009265361 0.845157 2.377388 0.017739 0.349544 3.668987 0.349544 3.668987

X VARIABLE 4 0.73263803 0.172699 4.242278 2.55E-05 0.393491 1.071785 0.393491 1.071785

The regression model for students with double degree is presented as under:

Marks = 14.267+0.911*(gender)+4.644*(degree type)+1.626*(country of citizenship)+0.725*(lecture attendance)

Page 20: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

The value of coefficient of determination is 0.06 which shows that the model is capable of explaining only 6% variation in the student

marks with this set of explanatory variables.

Table 10.

SUMMARY

OUTPUT

Regression Statistics

Multiple R 0.256808

R Square 0.06595

Adjusted R

Square 0.059596

Standard Error 12.09717

Observations 593

ANOVA

df SS MS F

Significance

F

Regression 4 6075.635 1518.909 10.3792 3.96E-08

Residual 588 86048.85 146.3416

Total 592 92124.49

Page 21: Data Analysis Assessment Item 2 Research Report Factors ... · The two-sample t-test conducted with equal variances assumed has yielded in following output table which is presented

https://www.sampleassignment.com/

Coefficients

Standard

Error t Stat P-value Lower 95%

Upper

95%

Lower

95.0%

Upper

95.0%

Intercept 14.2679 2.777546 5.136871 3.81E-07 8.812777 19.72301 8.812777 19.72301

X Variable 1 0.911667 1.006456 0.90582 0.365402 -1.06502 2.888353 -1.06502 2.888353

X Variable 2 4.644415 1.078483 4.306434 1.94E-05 2.526268 6.762563 2.526268 6.762563

X Variable 3 1.626817 0.928508 1.752076 0.080282 -0.19678 3.450413 -0.19678 3.450413

X Variable 4 0.725954 0.17593 4.126387 4.22E-05 0.380427 1.071481 0.380427 1.071481

From the above conducted multiple regression analysis, it is concluded that for single degree students, model has a better

predictive capacity. In terms of gender, no significance has been observed for the factor to influence the student marks. Lecture

attendance and degree type has a greater impact on the marks of students. The best multiple regression model in generic sense is the

one encompassing all the four factors while, when differentiated on the basis of degree type, students with single degree have a better

regression model developed for predicting heir marks in comparison to that of double degree students.