gavinmolitor.files.wordpress.com€¦ · Web viewRunning Header: EXCEL PROJECT. Excel Project: An...
Transcript of gavinmolitor.files.wordpress.com€¦ · Web viewRunning Header: EXCEL PROJECT. Excel Project: An...
Expenditures, Eligibility, and SAT Scores
Running Header: EXCEL PROJECT
Excel Project:
An Analysis of State Expenditures, Student Eligibility Percentages, and SAT Scores
Gavin L. Molitor
Seattle Pacific University
1
Expenditures, Eligibility, and SAT Scores
The data was collected from all 50 states and the District of Columbia (states) as
reported in the Digest of Education Statistics, an annual publication of the U.S. Department
of education, for a study done as a result of the debate and controversy over equity in
public school expenditures and the relation to academic performance. A variety of
statistical measures were used to analyze, compare, and interpret this data, and details are
accompanied by descriptions, results, conclusions, and a discussion.
Part 1 – Histograms, Box Plots, and Frequency Distribution
The data collected from each of the states and includes expenditures per pupil in
average daily attendance, student/teacher ratio, SAT scores in Reading, and SAT scores in
Writing. The following histograms and corresponding box plots are used to explain the
distributions of these variables as they appear in the data. We first look at expenditures.
<=7000 (7000, 8000]
(8000, 9000]
(9000, 10000]
(10000, 11000]
(11000, 12000]
(12000, 13000]
(13000, 14000]
>140000
2
4
6
8
10
12Figure 1.1 - Expenditures
In histogram figure 1.1, the most frequent occurrences are for the eleven states
spending between $8,000 and $9,000 per pupil in average daily attendance in public
elementary and secondary schools, followed by ten states spending between $10,000 and
$11,000, and nine states spending between $9,000 and $10,000. From this histogram, we
can also observe that 59% of the States spend between $8,000 and $11,000. The lowest
2
Expenditures, Eligibility, and SAT Scores
frequency is for Utah, which spends $5,960. This data does not follow a normal
distribution. It is negatively skewed to the left and also indicates that a few states are
spending significantly more money in expenditures as shown in box plot figure 1.2.
0 5000 10000 15000 20000 25000
Figure 1.2 - Expenditures
Figure 1.2 shows that the median amount is $9805. After looking into the outliers
for this distribution, the District of Columbia, New York, and New Jersey spent significantly
higher amounts of money, and are largely responsible for the skewness. We now turn our
attention to the student-to-teacher ratio at the secondary level.
<=10
(10, 11]
(11, 12]
(12, 13]
(13, 14]
(14, 15]
(15, 16]
(16, 17]
(17, 18]
(18, 19]
(19, 20]
(20, 21]
(21, 22]
(22, 23]
>23
0Frequen
cy
<=10 (10, 11]
(11, 12]
(12, 13]
(13, 14]
(14, 15]
(15, 16]
(16, 17]
(17, 18]
(18, 19]
(19, 20]
(20, 21]
(21, 22]
(22, 23]
>230
2
4
6
8
10
12
14Figure 2.1 - Secondary Student/Teacher Ratio
In histogram figure 2.1, the most frequent value for student-to-teacher ratios fell
between 13.0 and 14.0 and occurred in thirteen different states. The distribution appears
to approach normality, but there are twenty-two states clustered between 14.0 and 17.0 as
3
Expenditures, Eligibility, and SAT Scores
well as a more frequent occurrence of higher ratios in other states. As a result, this data is
negatively skewed, indicated by the shorter lengths of the 2 lower quartiles and the longer
length of the highest quartile whisker in box plot figure 2.1.
0 5 10 15 20 25 30
Figure 2.1 - Secondary Student/Teacher Ratio
The median ratio in figure 2.1 is 14.8, yet the highest frequency of occurrence for
this variable was a ratio between 13.0 and 14.0. After analyzing the outliers in the data, it
became apparent that Utah, Oregon, California, and Arizona have an abnormally high
student-to-teacher ratio. Next, we look at SAT scores in Reading.
<=480
(480, 490]
(490, 500]
(500, 510]
(510, 520]
(520, 530]
(530, 540]
(540, 550]
(550, 560]
(560, 570]
(570, 580]
(580, 590]
(590, 600]
>6000
2
4
6
8
10
12Figure 3.1 - SAT Scores in Reading
Histogram figure 3.1 shows the most frequently occurring average for student SAT
scores in Reading fell between 490 and 500 for eleven different states. This data has a
strong negative skew because only three states had lower averages than the most frequent
occurrence. Box plot figure 3.2 shows that the lower 50% of scores in the date are more
4
Expenditures, Eligibility, and SAT Scores
clustered, and that scores in the upper 50% of the data have a much greater range.
200 300 400 500 600 700 800 900
Figure 3.2 - SAT Scores in Reading
The median value for this data is 523. The lowest value is 482 and the lower hinge
value is 498. This puts the lower 50% of the data in a range of 41. The upper hinge value is
569 and the highest value is 610. This gives the upper 50% of the data a range of 87. This
shows us that the states with higher average student scores are much more spread out
across the upper range of values, whereas the states with lower averages tend to cluster
near the lower hinge value.
<=480
(480, 490]
(490, 500]
(500, 510]
(510, 520]
(520, 530]
(530, 540]
(540, 550]
(550, 560]
(560, 570]
(570, 580]
(580, 590]
(590, 600]
>6000
2
4
6
8
10
12Figure 4.1 - SAT Scores in Writing
The data for SAT scores in writing, shown in figures 4.1 and 4.2, follows a very similar
trend as to the data for SAT scores in reading.
5
Expenditures, Eligibility, and SAT Scores
200 300 400 500 600 700 800 900
Figure 4.2 - SAT Scores in Writing
The median value of 511 for writing is 12 points lower than the median value of 523
for reading. The lowest value of 472 is 10 points lower, and the lower hinge value of 490.5
is 7.5 points lower. The upper hinge value of 564 is 5 points lower, but the highest value of
591 is 19 points lower. The range for the lower 50% of the data is 39, yet for the upper
50% of the data it is 80. The negative skew is very similar to that of the SAT scores for
reading, but the range of the entire distribution of SAT scores in Writing is not quite as
large.
In an analysis of figure 5, the frequency distribution for the categorical variable
regions, we observe that the South region is comprised of 17 different states, whereas the
northeast region consists of only 9 states. The West and Midwest are comprised of 13
states and 12 states respectively. These four regions make up our population.
West Midwest South Northeast0
2
4
6
8
10
12
14
16
18Figure 5 - Regions
6
Expenditures, Eligibility, and SAT Scores
Part II – Regional Comparisons
Having analyzed the distributions for the four continuous variables of interest as
well as the regional breakdown, we now focus on identifying differences between regions
and determining statistical and practical significance.
The ANOVA test, figure 6.1 is used to determine variance between the regions in
regards to expenditures. We conclude from this between-subjects test that there is a
significant difference between the regions because the F ratio is calculated to be 9.75,
which is greater than the critical value for F (3,47) = 2.80.
Figure 6.1 - Tests of Between-Subjects Effects: ExpendituresDependent Variable: Current expenditure per pupil in average daily attendance in public elem and sec schools 2005-06
SourceType III Sum of
Squares df Mean Square F pPartial Eta Squared
region 120,097,648.80 3 40,032,549.60 9.75 .00 .38
Error 192,953,101.83 47 4,105,385.15
Total 5,752,870,321.00 51
Corrected Total 313,050,750.63 50
From figure 6.2, the multiple comparisons post-hoc Tukey HSD test, we can
conclude that the difference in expenditures is statistically significant at the .05 level when
comparing the Northeast region to all three other regions, but that it is not statistically
significant when comparing these other three regions to each other. We can expect to find
statistically similar values for expenditures when comparing states in the West, Midwest,
and South, but we can expect to find significantly increased values for expenditures in the
Northeast states.
7
Expenditures, Eligibility, and SAT Scores
Figure 6.2 - Multiple Comparisons: ExpendituresCurrent expenditure per pupil in average daily attendance in public elem and sec schools 2005-06 Tukey HSD
(I) region (J) region
Mean Difference
(I-J)Std. Error P
95% Confidence Interval
Lower Bound
Upper Bound
West Midwest -660.49 811.12 .85 -2820.82 1499.83
South -475.96 746.52 .92 -2464.23 1512.31
Northeast -4356.52* 878.61 .00 -6696.60 -2016.45
Midwest West 660.49 811.12 .85 -1499.83 2820.82
South 184.53 763.94 1.00 -1850.14 2219.21
Northeast -3696.03* 893.46 .00 -6075.66 -1316.40
South West 475.96 746.52 .92 -1512.31 2464.23
Midwest -184.53 763.94 .995 -2219.21 1850.14
Northeast -3880.56* 835.25 .000 -6105.17 -1655.96
Northeast West 4356.52* 878.61 .00 2016.45 6696.60
Midwest 3696.03* 893.46 .00 1316.40 6075.66
South 3880.56* 835.25 .00 1655.96 6105.17*. The mean difference is significant at the .05 level.
The mean differences in figure 6.2, comparing the expenditures of Northeast those
of other regions, shows values that range between $3,696.03 and $4,356.52. The practical
significance of this data is that we know the expenditures per pupil in average daily
attendance in public elementary and secondary levels will be comparatively greater for the
9 Northeastern states than for the other 42 states in the nation.
Further research is needed to analyze regional education expenditures, and data
should be focused on categorical expenditures in high-percentage areas of each state
education budget, such as higher education, special education, and teacher salaries. Further
research should also examine differences between rural, urban, and suburban areas.
8
Expenditures, Eligibility, and SAT Scores
Let us now turn our attention to student/teacher ratios. An ANOVA test between
regions indicates an F ratio of 13.08. This is statistically significant, as it is greater than
2.80, the critical value of F (3,47). A Dunnett C test with an alpha level of .05 indicates that
the West region is significantly different than each of the other three regions, and we can
expect a greater student-to-teacher ratio. In comparing this region to other regions, the
mean differences ranges from 2.94 to 5.07. The Northeast has a lower ratio than the other
regions, but the difference was only statistically significant when comparing this region to
the South and the West. Figure 7 illustrates this variance in the mean ratios for each region.
Figure 7 - Descriptive Statistics: Student/Teacher RatioDependent Variable: Average pupil/teacher ratio Fall 2005
region MeanStd.
Deviation N
West 17.81 2.90 13
Midwest 14.81 1.71 12
South 14.86 1.28 17
Northeast 12.73 1.46 9
Total 15.23 2.54 51
Because the values for the standard deviation in each region are low, we can find a
practical significance in the data and expect sample states in the region to closely represent
the population mean for that region. However, this research does not account for support
staff and other certificated teacher positions that might influence the difference between
the theoretical student/teacher ratio and the actual average number of students in a
classroom. Further research is necessary to explore the impact of such educational
positions on this ratio for each region.
9
Expenditures, Eligibility, and SAT Scores
Before we focus on SAT scores, it is important to observe and analyze the
data regarding the percentage of eligible students taking the SAT tests in each region. An F
ratio of 16.66, which is greater than the critical value of 2.80 for F(3,47), indicates a highly
significant difference and may be one of the most crucial variables to consider when
making comparisons and inferences about SAT performance. The mean percentage of
students taking the SAT tests in the Northeast, as indicated in figure 8.1, was 81.44. The
means for the other three regions ranged from 12.67 in the Midwest to 40.35 in the South.
It is important to observe the standard deviation as well, which indicates a much smaller,
more predictable range for the Northeastern states, and greater fluctuation in the other
three regions.
Figure 8.1 - Descriptive Statistics: SAT EligibleDependent Variable: Percentage of all eligible students taking the SAT 2006-07
region MeanStd.
Deviation N
West 33.46 18.85 13
Midwest 12.67 16.75 12
South 40.35 30.89 17
Northeast 81.44 10.33 9
Total 39.33 31.12 51
A Dunnett C test, figure 8.2, shows statistical significance at the .05 level when
comparing the Northeast to the other three regions, and the difference of the means in
these comparisons is greater than any other comparisons. There is also statistical
significance when comparing the very low mean percentage of the Midwest region to the
South region. It is important to note the confidence intervals for this data as well, which
10
Expenditures, Eligibility, and SAT Scores
indicate a large range in which the mean difference between two regions is likely to fall, but
also clearly indicates a direction for that difference in most comparisons. We can observe
the most obvious and strongest of these relationships by looking at the confidence intervals
when comparing the northeast to any of the other region.
Figure 8.2 - Multiple Comparisons: SAT EligiblePercentage of all eligible students taking the SAT 2006-07Dunnett C
(I) region (J) region
Mean Difference
(I-J)Std. Error
95% Confidence Interval
Lower Bound
Upper Bound
West Midwest 20.80 7.12 -.48 42.07
South -6.89 9.14 -33.35 19.57
Northeast -47.98* 6.26 -67.02 -28.95
Midwest West -20.80 7.12 -42.07 .48
South -27.69* 8.92 -53.59 -1.79
Northeast -68.78* 5.94 -87.03 -50.53
South West 6.89 9.14 -19.57 33.35
Midwest 27.69* 8.92 1.79 53.59
Northeast -41.09* 8.25 -65.17 -17.01
Northeast West 47.98* 6.26 28.95 67.02
Midwest 68.78* 5.94 50.53 87.03
South 41.09* 8.25 17.01 65.17*. The mean difference is significant at the .05 level.
The practical significance of this data lies in the impact that these percentages will
have on the generalizability of conclusions about SAT scores and any correlations involving
this data. This data raises significant questions about the South, the West, and especially
the Midwest regions in terms of whether or not the sample groups taking the SAT tests are
representative of the population.
11
Expenditures, Eligibility, and SAT Scores
Further research is needed to determine if the means being reported for regional
SAT scores are being skewed by the percentages and the high degree of variance between
regions as indicated in this Dunnett C test. Further research might focus the correlation
between an eligible student’s GPA and whether or not they take the SAT tests, or
percentages of SAT test takers and enrollment in higher education.
We can now focus on SAT scores in writing. We can observe from the data in figure
9.1 that the mean SAT score was highest in the Midwest with a value of 564.17 and lowest
in the Northeast with a value of 497.22. The West and the Midewest were clustered near
the mean total for the nation, which has a value of 525.37. It is important to note that the
standard deviation is much lower in the Northeast region, indicating a lesser amount of
variance between scores.
Figure 9.1 - Descriptive Statistics: SAT Scores in WritingDependent Variable: Average writing SAT score 2005-06
region MeanStd.
Deviation N
West 515.00 25.29 13
Midwest 564.17 31.04 12
South 520.82 39.19 17
Northeast 497.22 11.24 9
Total 525.37 37.63 51
Figure 9.2 shows a statistical significant at the .05 level when comparing the
Midwest with any of the other three regions. The mean difference ranges between
43.34 and 66.94, and indicates a higher statistical probability of greater scores in the
Midwest region. However, the confidence intervals for this data are expansive, indicating a
12
Expenditures, Eligibility, and SAT Scores
high degree of variance when determining the mean scores between states from different
regions.
Figure 9.2 - Multiple Comparisons: SAT Scores in WritingAverage writing SAT score 2005-06Dunnett C
(I) region (J) region
Mean Differenc
e (I-J)Std. Error
95% Confidence Interval
Lower Bound
Upper Bound
West Midwest -49.17* 11.38 -83.24 -15.10
South -5.82 11.81 -40.07 28.42
Northeast 17.78 7.95 -6.24 41.80
Midwest West 49.17* 11.389 15.10 83.24
South 43.34* 13.06 5.06 81.63
Northeast 66.94* 9.71 37.43 96.46
South West 5.82 11.81 -28.42 40.07
Midwest -43.34* 13.06 -81.63 -5.06
Northeast 23.60 10.22 -6.10 53.30
Northeast West -17.78 7.95 -41.80 6.24
Midwest -66.94* 9.71 -96.46 -37.43
South -23.60 10.22 -53.30 6.10*. The mean difference is significant at the .05 level.
The practical significance of this data is questionable, based on the issues previously
raised regarding the percentages of eligible students taking the SAT tests in each region. It
is beyond the scope and limitations of this data to infer causation between these variables.
Any conclusions drawn from the initial report involving SAT scores may be distorted
significantly by a gross sampling error of the students taking SAT tests in various states and
regions. We take a closer look at correlations between these and other variables in Part III.
13
Expenditures, Eligibility, and SAT Scores
Part III – Correlation, Scatterplots, and Regression Equations
Here we look to understand correlation and analyze the relationship between
variables by looking at scatterplots and regression equations. This analysis continues to
focus on SAT scores as the dependent variable. The independent variables are
Expenditures, Student/Teacher Ratio, and the percentage of Eligible Students who take the
SAT tests. Each scatterplot shows a negative relationship, as indicated by the downward
slope of the line of best fit. The correlation coefficient will be measured for statistical
significance against the critical value at the .05 alpha level of 0.279, which coincides with
48 degrees of freedom - the closest value we can obtain to our actual calculated df value of
49. It is important to remember that statistical significance is determined by the
correlation coefficient as it approaches +/- 1.00, a perfect correlation. Even if a value is
negative, it can be greater than our critical value. It is also important to remember that we
are looking to make determinations about statistical and practical significance through
correlation, not looking to draw conclusions about causation. Let us look at our first
scatterplot.
4000 6000 8000 10000 12000 14000 16000 18000 200000
100200300400500600700
f(x) = − 0.00596941837135119 x + 587.023414437313
Expenditures
SAT
Scor
es
Figure 10 - Expenditures and SAT Scores in Writing
Regression Equation: y = 0.006x + 587.02Coefficient of Determination: r^2 = 0.1576Coefficient of Correlation: r = -0.3969
14
Expenditures, Eligibility, and SAT Scores
In figure 10, we see the slight negative correlation between Expenditures and SAT scores.
The slope is a -0.006. The y-intercept is 587.02. The correlation coefficient is -0.396 is greater than
our critical value of 0.279, indicating a statistical significance. This relationship is a weak because
the correlation coefficient has the low value. Most of the values on the scatterplot are clustered
between $8,000 and $12,000 and do not fall very close to the line of best fit. Though statistically
significant, Expenditures are not a strong, predictor of SAT Scores, and this correlation has limited
practical significance. Let us turn our attention to our second set of variables.
10 12 14 16 18 20 22 240
100
200
300
400
500
600
700
f(x) = − 0.987225596120366 x + 540.337727614915
Student/Teacher Ratio
SAT
Scor
es
Figure 11 - Student/Teacher Ratio and SAT Scores in Writing
Regression Equation: y = 0.9872x + 540.34Coefficient of Determination: r^2 = 0.0044 Coefficient of Correlation: r = -0.0662
In looking at the relationship between Student/Teacher Ratio and SAT Scores, we can see in
figure 11 that there is a slight negative correlation. The regression equation calculates the slope at -
0.987 and the y-intercept at 540.34. The slight downward direction of the line of best fit again
shows the negative relationship between these two variables. The correlation coefficient is
-0.066. Because this value is less than our critical value of 0.279, we accept the null hypothesis that
there is no statistical significance in the relationship. Let us look at our last scatterplot.
15
Expenditures, Eligibility, and SAT Scores
0 20 40 60 80 100 1200
100200300400500600700
f(x) = − 1.05161947499553 x + 566.736248369432
Eligible
SAT
Scor
esFigure 12 - Eligible and SAT Scores in Writing
Regression Equation: y = -1.0516x + 566.74Coefficient of Determination: r^2 = 0.7565Coefficient of Correlation: r = -0.8698
Figure 12 illustrates the negative correlation between the percentage of students
who graduate high school and take the SAT tests and performance on SAT tests in writing.
The correlation coefficient for this relationship is -0.870. This value is much greater than
our critical value of 0.279. It indicates a strong correlation between these two variables as
it approaches -1.00. As we look at the data values along the line of best fit, we can see that
they are either touching the line, or very close to it. The regression equation calculates the
slope to be -1.052 and the y-intercept to be 566.74.
Conclusions
There is a concerning degree of skewness to the variables evaluated in Part I. While
sample error is expected, this data shows enough of a departure from normality to suggest
that some of the sample means are not representative of the population. This skewness was
most obvious in the SAT scores. By evaluating the correlations through scatterplots in Part
III, we can see that there is a strong relationship between the percentage of eligible
students who take the SAT tests and student performance on the SAT writing tests. Further
16
Expenditures, Eligibility, and SAT Scores
research is needed to investigate this relationship. It is necessary to evaluate those states
who have a very low percentage of students taking the SAT tests, yet report a high mean
score on the SAT test in writing, to determine if this mean scores of the sample is
representative of the population, or if it is severely skewed data. In Part II, it was shown
that there was high degree of variability in the percentages of students in different states
and different regions who go on to take the SAT tests. This analysis of the data related to
SAT eligibility percentages and scores raises serious concerns about external validity that
requires further research and evaluation.
17