The Factors that Affect GPA of Undergraduate Students
NinaSticky NotePoints: 95
ABSTRACT
The purpose of this case study is to show the correlation between GPA and amount of
hours worked. Using a t-test it was found that we did not reject the hypothesis of µ = µ. The
regression equation for GPA vs. number of hours worked was f(x) = 8.899132 + 0.555767x. The
coefficient of determination was found to be 0.000602, and the correlation coefficient was
0.024546. As a second variable, the number of hours of sleep was tested as a factor that affects
GPA. For GPA vs. number of hours of sleep the regression equation was y = 3.357+ -0.00665x.
The coefficient of determination was 0.021732, and the correlation coefficient was -0.147416.
INTRODUCTION
This case study is to show the relationship between students’ GPA and the number of
hours they work per week. It is thought that a student that works more will have a lower GPA
because the student has less time to study. This means the mean GPA of the group of students
that are unemployed should have a higher GPA than the students that are employed. Additional
factors that could affect GPA, such as sleep, the number of hours of sleep each student got were
also recorded. This hypothesis has been discussed in the article “Term-Time Employment and
the Academic Performance of Undergraduates”. In this article they state that their survey data of
students through the years 2004 to 2008 “finds that an increase in work hours has negative
effects on GPA” (Wenz). This was our original thought process as well and the motivation
behind collecting and analyzing the data collected.
We collected our data by creating a survey online and inviting our friends using social
media to take the survey. We collected 50 survey entries. Our survey was conducted using the
website Kwiksurveys.com and consisted of 16 questions (see attached survey). From these
questions we were able to choose useful numbers in which we thought best correlated with GPA.
NinaHighlight
NinaSticky Notemissing subscripts
NinaSticky NoteDo not include equations in an abstract
To analyze our collected data, we used Excel to organize our data and compute statistics and
graphs.
To show if there was a difference between the means of employed peoples’ GPA and
unemployed peoples’ GPA we used a 2-tailed non-pooled t-test. We first conducted a hypothesis
test with our Ho: µ1 = µ2. Our null hypothesis was that the mean GPA of unemployed students
was equal to the mean number of employed students. Our alternative hypothesis stated that the
mean GPA of unemployed students did not equal the mean GPA of employed students, Ha: µ≠ µ.
We did this at the 95% confidence interval. The test was performed at the 5% significance level,
α = 0.05. The null hypothesis was that the means would be equal. This would mean there is no
significant difference at the 95% confidence interval between the GPA of people who work and
the people who don’t work.
To figure this out we first needed to find out what the means of both employed peoples’
GPA and unemployed peoples’ GPA. The means were actually quite close. Employed people
had a mean GPA of 3.08, and unemployed people had a mean GPA of 3.016. This already goes
against our thought that unemployed people would have a higher GPA.
The critical values, t were calculated to indicate any discrepancy between the two means
was present. To solve for t you have to subtract the mean of one minus the mean of the other and
divide it by the square root of the standard deviation squared of one data set over the number of
variables plus the standard deviation squared of the other data set over the number of variables.
From this calculation, t value was obtained, which equaled to 0.530. Since this is a 2-tailed test
0.530 had to be between the two significant values. Since alpha is 0.05 and it was a two tailed
test, the alpha value was then split to be α = 0.025. We found the t value to be ±2.011. The value
0.530 falls in the region where we cannot reject the null. This means there is no significant
difference between the mean GPA of employed people and the GPA of unemployed people. This
appears contradictory for we expected the GPA of unemployed people to be higher since they
would be able to study more.
From this result, we’ve considered other additional factors that could affect the findings
we made and realized just because people have more time to study doesn’t mean they are using
it. We also realized this could be due to the fact that working people are more motivated due to
the fact they realize what it is like to work and most likely not be getting paid a wage they are
happy with.
The purpose of a 2-tailed hypothesis test is to show that the true mean of a population is
between two numbers given by the t value within a certain confidence level. The higher the
confidence level, the closer together the two numbers that the true mean needs to be between. If
the number does not fall between the numbers the null hypothesis is rejected. If it lies within the
range the null is not rejected, like what happened with our experiment. This is how we were able
to show that the two means are equal at the 95% confidence interval.
Our hypothesis test shows that the mean GPA of students who are employed is equal to
the mean GPA of students who are unemployed. To further prove this, we conducted a
regression analysis. By conducting this analysis we would be able to recognize if there was any
correlation between GPA and the number of hours worked per week by a student. If there is no
correlation seen between these two variables, then our hypothesis test would prove to be correct.
This data was taken from 50 randomly selected college students, all attending different
universities. The GPA range from 2.0 to 3.95 and the number of hours worked per week range
from 0 to 40. This regression analysis analyzed the following data:
Table I: GPA and Number of Hours Worked
GPA Hours Work
2 20
2.4 10
2.5 10
2.5 3
2.5 20
2.56 8
2.56 0
2.56 12
2.58 0
2.6 10
2.6 0
2.7 15
2.89 15
2.9 20
2.9 20
2.9 25
2.9 0
2.9 0
2.9 0
2.9 0
2.9 15
3 15
3 0
3 15
3 20
GPA Hours Work
3.01 5.5
3.1 6
3.1 20
3.17 27
3.2 0
3.2 0
3.2 20
3.24 20
3.27 10
3.29 20
3.3 10
3.32 6.5
3.33 12.5
3.4 10
3.4 0
3.43 40
3.5 5
3.5 0
3.5 0
3.6 0
3.67 9
3.69 5
3.7 20
3.8 25
3.95 5.5
By placing this data into a scatter plot, it was easy to distinguish a weak relationship between the
number of hours worked per week and GPA. A scatter plot for the data collected is shown as:
Figure 1: GPA and Number of Hours Worked
Obtaining a regression equation for the data in the scatter plot does not seem reasonable.
This is because all of the data is spread out across the scatter plot. Also, if a line of best fit were
to be drawn, most of the points would not hit the line, which is an indication that there is little to
no correlation between GPA and number of hours worked per week. The data presents as:
SUMMARY OUTPUT for figure 1
Regression Statistics Multiple R 0.024546 R Square 0.000602 Adjusted R Square -0.02022 Standard Error 9.493849 Observations 50
ANOVA df SS MS F Significance F
Regression 1 2.608213 2.608213 0.028937 0.865639 Residual 48 4326.392 90.13316
Total 49 4329
0 1 2 3 4 5
0
5
10
15
20
25
30
35
40
45
GPA
Nu
mb
er
of
ho
urs
wo
rke
d p
er
we
ek
Number of Hours worked and GPA
Number of Hours working
Linear (Number of Hours working)
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 8.899132 10.08839 0.882116 0.38211 -11.3849 29.18321
X Variable 1 0.555767 3.267106 0.17011 0.865639 -6.01319 7.124724
The regression equation for the data is f(x) = 8.899132 + 0.555767x. From this regression line
you get a slope of 0.555767. The slope is slightly increasing, inferring that GPA increases as the
number of hours worked increases. The best fit line only passes through a few of these points on
the scatter plot, showing that it would not be a good idea to make an inference about GPA and
number of hours worked from this regression equation. It does not represent enough of the data
correctly.
The coefficient of determination is 0.000602. This shows how much variation in the GPA
is explained by the number of hours worked. This number shows how much of the actual data
was explained by the regression line. If only 0.06% of the data was explained by the regression
line, this indicates that a lot of the data goes unexplained by the line of best fit. The correlation
coefficient is 0.024546. This was found by taking the square root of r2 (0.000602). The
correlation coefficient shows that since 0.024546 is not close to 1 (which would be a perfect line
of fit) that it would not be a good idea to say that GPA depends on number of hours worked. This
is an example of an extremely weak positive linear correlation. Because it is so close to zero, it
could possibly be considered linearly uncorrelated. This implies that the regression equation is
not useful for making predictions.
An outlier is an observation that lies outside the overall pattern of the data. In this data
set, the outlier seen is at (3.43, 40). This data points lies far from the regression line, relative to
the other data points. This point is determined an outlier because it is located outside of where all
the other points seemed to be condensed. This point was not near the regression line and had
great affect upon the regression equation. The removal of this point changes the line of
regression. A potential influential observation would also be seen at (3.43, 40). This point’s
removal would change the coefficient of determination to 0.001429. This would also change the
regression equation to f(x) = 12.35081 – 0.77004x. Both the slope and y-intercept change
considerably.
By removing the outlier listed above, a new scatter plot would present as:
Figure 2: Number of Hours Worked and GPA (removal of outlier)
Obtaining a regression equation for the data in the scatter plot would once again be
unreasonable. This is because the data is still all so far away from the regression line. Most of the
points do not hit the line of best fit, indicating that there is little to no correlation between GPA
and number of hours worked per week.
The data with the removal of the outlier, (3.43, 40) would present as:
0 1 2 3 4 5
0
5
10
15
20
25
30
GPA
Nu
mb
er
of
ho
urs
wo
rke
d p
er
we
ek
Number of Hours worked and GPA
Number of Hours working
Linear (Number of Hours working)
SUMMARY OUTPUT for figure 2
Regression Statistics
Multiple R 0.037797 R Square 0.001429 Adjusted R Square -0.01982 Standard Error 8.557783
Observations 49
ANOVA
df SS MS F Significance F
Regression 1 4.924384 4.924384 0.06724 0.796531
Residual 47 3442.076 73.23565
Total 48 3447
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 12.35081 9.147798 1.35014 0.183437 -6.05218 30.75381
X Variable 1 -0.77004 2.969591 -0.25931 0.796531 -6.74408 5.20401
The new regression equation for this data is f(x) = 12.35081 – 0.77004x. From this
regression line you get a slope of -0.77004. This slope is slightly decreasing, inferring that as
number of hours worked decreases, the GPA increases. The best fit line still only passes through
a few points on the scatter plot, inferring that it is still not a good idea to make an inference about
the data using this new regression equation. The coefficient of determination is 0.001429. This
shows how much variation in the GPA is explained by the number of hours worked per week. If
only 0.14 % of the data is explained by the regression line, this indicates that a lot of the data
goes unexplained by the line of best fit. The correlation coefficient is -0.037797. This was found
by taking the square root of r2 (0.001429). The correlation coefficient shows that since -0.037797
is not close to 1 (a perfect line of fit), that it would not be a good idea to say that GPA depends
on the number of hours worked per week. This would be an example of a weak negative linear
correlation. Because it is so close to zero it could possibly be considered linearly uncorrelated.
This implies that the regression equation is not useful in making predictions.
By removing the outlier at (3.43, 40), the regression equation went from a positive to a
negative slope. This dramatically changes the analysis of the data from inferring that GPA
increases as the number of hours worked increases, to inferring that as GPA increases, the
number of hours worked decreases. These two regression equations are completely different and
infer different ideas about the relationship between GPA and number of hours worked. Although
both regression equations differed considerably, the amount of data explained by these lines was
so small that neither equation could be used dependably. Both equations had extremely small
coefficients of determination, suggesting that both regression lines were not strong enough to
make inferences about the data.
The regression analysis of the data further explained our results for the hypothesis test of
whether or not the mean GPA of unemployed students was equal to the mean GPA of employed
students. By taking the actual number of hours worked by employed and unemployed students
we were able to conclude that there is no relationship between the GPA and number of hours
worked, and that GPA does not depend on the number of hours worked. By finding this, it
proved that the result of the hypothesis test was reasonable in showing that the mean GPA of
unemployed students was equal to the mean GPA of employed students, proving that both mean
GPAs are equal because there is no affect upon GPA by number of hours worked (employment).
The original data of the GPA scores and hours worked per week can be further tested by
a residual analysis. The first assumption of regression inferences is that a plot of the residuals
against the values of GPA should fall roughly in a horizontal band centered and symmetric
around the x-axis. The second assumption is that a normal probability plot of the residuals should
NinaSticky Notesince GPA is your dependent variable one would state: As hours worked increases (your independent variable) the GPA decreases (your dependent variable)
be roughly linear. If these assumptions are not met, the validity of the assumptions for regression
inferences of the data is lost. The residual plot of the data for GPA presents as:
Figure 3: Residual Plot GPA Data
This residual plot shows that the residuals fall roughly in a horizontal band that is
centered and symmetric about the x-axis. This meets the first assumption of a residual plot.
Figure 4: Normal Probability Plot for Residuals
The normal probability plot shows that the plot for the residuals is roughly linear. This
meets the second assumption of a normal probability plot. Interpreting these plots, we can
conclude that there are no obvious violations of the assumptions for regression inferences for the
variables GPA and number of hours worked per week.
-20
0
20
40
0 1 2 3 4 5
Re
sid
ual
s
GPA
GPA Residual Plot
0
20
40
60
0 20 40 60 80 100 120
Nu
mb
er
of
ho
urs
wo
rke
d
Residual
Normal Probability Plot
Table II: GPA and Number of Hours of Sleep
Table II: The data was obtained for a sample Undergraduate students at the University of Rhode
Island.
Previous attempt to determine whether the Number of Hours of Working has an effect on
students’ GPA resulted in a strongly weak correlation, more close to no correlation. To explore
additional factors that are associated to the GPA of students, the relationship between GPA and
the number of hours of sleep was examined. First, a scatterplot was developed in order to
visualize any apparent relationship between GPA and the number of hour of sleep:
GPA Hours of Sleep GPA Hours of Sleep
2 45 3.01 49
2.4 75 3.09 35
2.5 42 3.1 45
2.5 50 3.1 49
2.5 56 3.17 32
2.56 30 3.2 50
2.56 56 3.2 36
2.56 35 3.2 35
2.58 56 3.24 45
2.6 40 3.27 42
2.6 42 3.3 50
2.7 49 3.32 42
2.89 49 3.33 54
2.9 42 3.4 50
2.9 40 3.4 56
2.9 32 3.45 49
2.9 55 3.5 35
2.9 50 3.5 40
2.9 35 3.5 60
2.9 56 3.6 49
2.9 50 3.67 42
3 35 3.69 35
3 33 3.7 35
3 50 3.8 35
3 49 3.95 56
Figure 5: Scatterplot for the GPA and the average number of hours of sleep students get over
the span of 7 days data from Table II.
The scatterplot is consists of data from Table II with the horizontal axis used number of
hours of sleep and the vertical axis used for GPA. Although the GPA – number of hours sleep
data points do not fall exactly on a line, they appear to scatter about a line, so a regression
equation is obtained to further examine the relationship between the two variables.
Figure 6: Regression line and data points for GPA – Number of Hours of Sleep data
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 10 20 30 40 50 60 70 80
GP
A
Number of Hours of Sleep (Hours)
GPA vs. Number of Hours of Sleep
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
0 20 40 60 80
GP
A
Number of Hours of Sleep (Hours)
GPA vs. Number of Hours of Sleep
GPA
Regression Equation
SUMMARY OUTPUT for Figure 6
Regression Statistics
Multiple R -0.147416
R Square 0.021732
Adjusted R
Square 0.001351
Standard Error 0.413929
Observations 50
ANOVA
df SS MS F Significance F
Regression 1 0.182695 0.182695 1.066288 0.30696
Residual 48 8.224193 0.171337
Total 49 8.406888
Coefficients
Standard
Error t Stat P-value Lower 95% Upper 95%
Intercept 3.356989 0.296543 11.3204 3.75E-15 2.760749 3.953229
X Variable 1 -0.00665 0.006437 -1.03261 0.30696 -0.01959 0.006296
Table III: Summary of Regression Statistics; obtained from Microsoft Excel using Data Analysis.
From the summary of regression statistics table, the linear equation can be obtained: y =
3.357+ -0.00665x; where the y-intercept is 3.357 and the slope is -0.00665. The line slopes
downward, where the y-values decrease as x increase because the slope is negative. This
indicates that the GPA decreases as the number of hour of sleep increases, which is no surprise.
Although the data points are slightly scattered, they are scattered about a line, so it would be
appropriate to determine a regression line, which indicates that it is acceptable to determine the
coefficient of determination.
The coefficient of determination, represented r2, is a descriptive measure of the utility of
the regression equation for making predictions. It can be calculated using the proportion of
variation in the observed values of the response variable (SSR) by the total regression (SST).
From the summary of regression statistics table, the values of SSR and SST can be obtained,
which are 0.182695 and 8.406888, respectively, resulting in the coefficient of determination
value of 0.021732. The coefficient of determination value will always lie between 0 and 1, where
a value near 0 indicates that the regression equation is not very useful for making predictions,
while a value near 1 indicates that the regression is very useful for making predictions. Given
that the coefficient of determination value of 0.021732, which is extremely close to 0, suggests
that the regression is rather not very useful in making assumptions.
Another method used to measure the correlation between two quantitative variables is the
linear correlation coefficient, r. This value measures the strength of the linear relationship
between two variables. From the regression statistics table, the value of the linear correlation
coefficient can be determined, which is -0.147416. From this value of the linear correlation, we
can conclude a number of properties about the data. First, the negative r value reflects the slope
of the scatterplot, which in this case is negative. Second, the sign of r also suggests the type of
linear relationship. This r value is negative (-), suggesting the variables are negatively linearly
correlated, meaning that y will decrease linearly as x increases. Third, the magnitude of the r
value indicates the strength of the linear relationship between the two variables; for this
scatterplot it is -0.147416, which is not near ±1, but rather 0. Therefore this shows most weak
linear relationship between the variables. Furthermore, this can strongly conclude that the
variable x is a poor linear predictor of the variable y. Lastly but not least, the sign of r and the
sign of the slope of the regression line is identical. The sign of both r and the slope of the
regression line are negative. This identical sign implies that the regression equation and the
linear correlation are useful for making predictions. This results in an extreme weak negative
linear correlation with an r value of -0.147416, which can be concluded that the two variables are
linearly uncorrelated.
Examining both methods of Coefficient of Determination and Linear Correlation
Coefficient resulted in uncorrelated relationship between the GPA and the number of hours of
sleep. This leads to an assumption that there may be outliers or influential observations present in
the data. An outlier is any data point that lies far from the regression line. Influential observation
is any data point whose removal causes a significant change in the regression equation. Two
influential observations, (56, 3.95) and (45, 2) were removed and a new scatterplot was obtained.
Figure 7: New regression line and data points for GPA – Number of Hours of Sleep data with 2
influential observations, (56, 3.95) and (45, 2) removed.
SUMMARY OUTPUT for
figure 7
Regression Statistics
Multiple R -0.231041 R Square 0.05338 Adjusted R Square 0.032801 Standard Error 0.365501
Observations 48
0
0.5
1
1.5
2
2.5
3
3.5
4
0 20 40 60 80
GP
A
Number of Hours of Sleep
GPA vs. Number of Hours of Sleep
GPA
Regression Equation
ANOVA
df SS MS F Significance F
Regression 1 0.346528 0.346528 2.593951 0.114114 Residual 46 6.14517 0.133591
Total 47 6.491698
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 3.477697 0.264531 13.14663 3.45E-17 2.945223 4.010172
X Variable 1 -0.00929 0.005768 -1.61057 0.114114 -0.0209 0.002321
Table IV: Summary of Regression Statistics for new scatter plot with two influential observations
removed; obtained from Microsoft Excel using Data Analysis.
From the new summary of regression statistics table, the linear equation is obtained: y =
3.477 + -0.00929x; where the y-intercept is 3.477 and the slope is -0.00929. The line slopes still
negative, indicating that the GPA decreases as the number of hour of sleep increase. The
coefficient of determination for the new scatter plot is 0.05338. Although the r2 value has
increased significantly, it remains closer to 0, indicating that the regression is still not useful in
making assumptions. The linear correlation coefficient, r value for the new scatter plot is -
0.231041. This value has increased tremendously after having removed 2 influential
observations. The negative value reflects negative slope of the scatter plot which is identical for
the linear regression equation, also suggesting the negative linearly correlation relationship.
Although the magnitude of the r value is 0.231041 is still closer to 0 than ±1, it shows a better
weak correlation between two variables. From the linear correlation coefficient computation, we
can conclude there is a weak negative linear correlation with an r value of -0.231041.
CONCLUSION
The results were not what were initially expected. The mean GPA of both employed and
unemployed was not significantly different. This means working does not affect a student’s
GPA. Also, there was no correlation between GPA and the number of hours a student slept.
Possible reasons for this could be the students with jobs are more motivated to do well, or
perhaps the students all study for similar amounts of time even though some of them work during
the week.
NinaSticky NoteNicely done. In hindsight maybe you should have looked at credit hours --- people working may take only a minimum number of credits compared to those not working
References
Weiss, N.A. (2012). Introductory Statistics. 9th
Edition.
Wenz, M., & Yu, W. (2010). Term-Time Employment and the Academic Performance of
Undergraduates. Journal Of Education Finance, 35(4), 358-373.
SAMPLE SURVEY (Factors Affecting GPA)
1) Are you currently enrolled in college?
Yes
No
2) Are you a full-time or part-time student?
Full-time
Part-time
3) What is your current class status?
Freshman
Sophomore
Junior
Senior
4) What is your current field of study (major)?
Science
Math
English
History
Other
5) What is your current GPA? (Response in 0.00 format)
6) Do you live at home, on-campus or off-campus?
Home
On-campus
Off-campus
7) If you are an off-campus student, then how many hours per week do you spend driving to school? (If you
live on campus, then mark 0 as your response)
0
0-5
5-10
10-15
15 or more
8) Are you currently employed or unemployed?
Employed
Unemployed
9) On average, how many hours per week do you work? (Response in # hours format)
10) If you commute to work, how many hours per week do you spend driving to work? (If you are not
currently working, mark 0 as your answer)
0
0-5
5-10
10-15
15 or more
11) On average, how many hours of sleep do you get per week? (Response in 0.0 hours format)
12) How many hours per week do you spend doing school-related work (ex. homework, projects, paper, lab
reports, studying, etc)?
0-5
5-10
10-15
15 -20
20-25
25-30
30 or more
13) Are you currently involved in any extra-curricular activities? (Ex. sports, clubs, organizations, volunteer
work, etc)
Sports
Organizations
Clubs
Volunteer Work
None
Other
14) What is your family's average income?
0-20,000
20,000-40,000
40,000-60,000
60,000-80,000
80,000-100,000
100,000 or more
15) Are you financially independent?
Yes
No
16) What is your ethnicity?
Caucasian
African American
Native American
Asian
Hispanic
Other
Top Related