gavinmolitor.files.wordpress.com€¦ · Web viewRunning Header: EXCEL PROJECT. Excel Project: An...

Expenditures, Eligibility, and SAT Scores

Running Header: EXCEL PROJECT

Excel Project:

An Analysis of State Expenditures, Student Eligibility Percentages, and SAT Scores

Gavin L. Molitor

Seattle Pacific University

1


The data was collected from all 50 states and the District of Columbia (states) as

reported in the Digest of Education Statistics, an annual publication of the U.S. Department

of education, for a study done as a result of the debate and controversy over equity in

public school expenditures and the relation to academic performance. A variety of

statistical measures were used to analyze, compare, and interpret this data, and details are

accompanied by descriptions, results, conclusions, and a discussion.

Part 1 – Histograms, Box Plots, and Frequency Distribution

The data collected from each of the states and includes expenditures per pupil in

average daily attendance, student/teacher ratio, SAT scores in Reading, and SAT scores in

Writing. The following histograms and corresponding box plots are used to explain the

distributions of these variables as they appear in the data. We first look at expenditures.

<=7000 (7000, 8000]

(8000, 9000]

(9000, 10000]

(10000, 11000]

(11000, 12000]

(12000, 13000]

(13000, 14000]

>140000

2

4

6

8

10

12Figure 1.1 - Expenditures

In histogram figure 1.1, the most frequent occurrences are for the eleven states

spending between $8,000 and $9,000 per pupil in average daily attendance in public

elementary and secondary schools, followed by ten states spending between $10,000 and

$11,000, and nine states spending between $9,000 and $10,000. From this histogram, we

can also observe that 59% of the States spend between $8,000 and $11,000. The lowest

2


frequency is for Utah, which spends $5,960. This data does not follow a normal

distribution. It is negatively skewed to the left and also indicates that a few states are

spending significantly more money in expenditures as shown in box plot figure 1.2.

0 5000 10000 15000 20000 25000

Figure 1.2 - Expenditures

Figure 1.2 shows that the median amount is $9805. After looking into the outliers

for this distribution, the District of Columbia, New York, and New Jersey spent significantly

higher amounts of money, and are largely responsible for the skewness. We now turn our

attention to the student-to-teacher ratio at the secondary level.

<=10

(10, 11]

(11, 12]

(12, 13]

(13, 14]

(14, 15]

(15, 16]

(16, 17]

(17, 18]

(18, 19]

(19, 20]

(20, 21]

(21, 22]

(22, 23]

>23

0Frequen

cy

<=10 (10, 11]

(11, 12]

(12, 13]

(13, 14]

(14, 15]

(15, 16]

(16, 17]

(17, 18]

(18, 19]

(19, 20]

(20, 21]

(21, 22]

(22, 23]

>230

2

4

6

8

10

12

14Figure 2.1 - Secondary Student/Teacher Ratio

In histogram figure 2.1, the most frequent value for student-to-teacher ratios fell

between 13.0 and 14.0 and occurred in thirteen different states. The distribution appears

to approach normality, but there are twenty-two states clustered between 14.0 and 17.0 as

3


well as a more frequent occurrence of higher ratios in other states. As a result, this data is

negatively skewed, indicated by the shorter lengths of the 2 lower quartiles and the longer

length of the highest quartile whisker in box plot figure 2.1.

0 5 10 15 20 25 30

Figure 2.1 - Secondary Student/Teacher Ratio

The median ratio in figure 2.1 is 14.8, yet the highest frequency of occurrence for

this variable was a ratio between 13.0 and 14.0. After analyzing the outliers in the data, it

became apparent that Utah, Oregon, California, and Arizona have an abnormally high

student-to-teacher ratio. Next, we look at SAT scores in Reading.

<=480

(480, 490]

(490, 500]

(500, 510]

(510, 520]

(520, 530]

(530, 540]

(540, 550]

(550, 560]

(560, 570]

(570, 580]

(580, 590]

(590, 600]

>6000

2

4

6

8

10

12Figure 3.1 - SAT Scores in Reading

Histogram figure 3.1 shows the most frequently occurring average for student SAT

scores in Reading fell between 490 and 500 for eleven different states. This data has a

strong negative skew because only three states had lower averages than the most frequent

occurrence. Box plot figure 3.2 shows that the lower 50% of scores in the date are more

4


clustered, and that scores in the upper 50% of the data have a much greater range.

200 300 400 500 600 700 800 900

Figure 3.2 - SAT Scores in Reading

The median value for this data is 523. The lowest value is 482 and the lower hinge

value is 498. This puts the lower 50% of the data in a range of 41. The upper hinge value is

569 and the highest value is 610. This gives the upper 50% of the data a range of 87. This

shows us that the states with higher average student scores are much more spread out

across the upper range of values, whereas the states with lower averages tend to cluster

near the lower hinge value.

<=480

(480, 490]

(490, 500]

(500, 510]

(510, 520]

(520, 530]

(530, 540]

(540, 550]

(550, 560]

(560, 570]

(570, 580]

(580, 590]

(590, 600]

>6000

2

4

6

8

10

12Figure 4.1 - SAT Scores in Writing

The data for SAT scores in writing, shown in figures 4.1 and 4.2, follows a very similar

trend as to the data for SAT scores in reading.

5


200 300 400 500 600 700 800 900

Figure 4.2 - SAT Scores in Writing

The median value of 511 for writing is 12 points lower than the median value of 523

for reading. The lowest value of 472 is 10 points lower, and the lower hinge value of 490.5

is 7.5 points lower. The upper hinge value of 564 is 5 points lower, but the highest value of

591 is 19 points lower. The range for the lower 50% of the data is 39, yet for the upper

50% of the data it is 80. The negative skew is very similar to that of the SAT scores for

reading, but the range of the entire distribution of SAT scores in Writing is not quite as

large.

In an analysis of figure 5, the frequency distribution for the categorical variable

regions, we observe that the South region is comprised of 17 different states, whereas the

northeast region consists of only 9 states. The West and Midwest are comprised of 13

states and 12 states respectively. These four regions make up our population.

West Midwest South Northeast0

2

4

6

8

10

12

14

16

18Figure 5 - Regions

6


Part II – Regional Comparisons

Having analyzed the distributions for the four continuous variables of interest as

well as the regional breakdown, we now focus on identifying differences between regions

and determining statistical and practical significance.

The ANOVA test, figure 6.1 is used to determine variance between the regions in

regards to expenditures. We conclude from this between-subjects test that there is a

significant difference between the regions because the F ratio is calculated to be 9.75,

which is greater than the critical value for F (3,47) = 2.80.

Figure 6.1 - Tests of Between-Subjects Effects: ExpendituresDependent Variable: Current expenditure per pupil in average daily attendance in public elem and sec schools 2005-06

SourceType III Sum of

Squares df Mean Square F pPartial Eta Squared

region 120,097,648.80 3 40,032,549.60 9.75 .00 .38

Error 192,953,101.83 47 4,105,385.15

Total 5,752,870,321.00 51

Corrected Total 313,050,750.63 50

From figure 6.2, the multiple comparisons post-hoc Tukey HSD test, we can

conclude that the difference in expenditures is statistically significant at the .05 level when

comparing the Northeast region to all three other regions, but that it is not statistically

significant when comparing these other three regions to each other. We can expect to find

statistically similar values for expenditures when comparing states in the West, Midwest,

and South, but we can expect to find significantly increased values for expenditures in the

Northeast states.

7


Figure 6.2 - Multiple Comparisons: ExpendituresCurrent expenditure per pupil in average daily attendance in public elem and sec schools 2005-06 Tukey HSD

(I) region (J) region

Mean Difference

(I-J)Std. Error P

95% Confidence Interval

Lower Bound

Upper Bound

West Midwest -660.49 811.12 .85 -2820.82 1499.83

South -475.96 746.52 .92 -2464.23 1512.31

Northeast -4356.52* 878.61 .00 -6696.60 -2016.45

Midwest West 660.49 811.12 .85 -1499.83 2820.82

South 184.53 763.94 1.00 -1850.14 2219.21

Northeast -3696.03* 893.46 .00 -6075.66 -1316.40

South West 475.96 746.52 .92 -1512.31 2464.23

Midwest -184.53 763.94 .995 -2219.21 1850.14

Northeast -3880.56* 835.25 .000 -6105.17 -1655.96

Northeast West 4356.52* 878.61 .00 2016.45 6696.60

Midwest 3696.03* 893.46 .00 1316.40 6075.66

South 3880.56* 835.25 .00 1655.96 6105.17*. The mean difference is significant at the .05 level.

The mean differences in figure 6.2, comparing the expenditures of Northeast those

of other regions, shows values that range between $3,696.03 and $4,356.52. The practical

significance of this data is that we know the expenditures per pupil in average daily

attendance in public elementary and secondary levels will be comparatively greater for the

9 Northeastern states than for the other 42 states in the nation.

Further research is needed to analyze regional education expenditures, and data

should be focused on categorical expenditures in high-percentage areas of each state

education budget, such as higher education, special education, and teacher salaries. Further

research should also examine differences between rural, urban, and suburban areas.

8


Let us now turn our attention to student/teacher ratios. An ANOVA test between

regions indicates an F ratio of 13.08. This is statistically significant, as it is greater than

2.80, the critical value of F (3,47). A Dunnett C test with an alpha level of .05 indicates that

the West region is significantly different than each of the other three regions, and we can

expect a greater student-to-teacher ratio. In comparing this region to other regions, the

mean differences ranges from 2.94 to 5.07. The Northeast has a lower ratio than the other

regions, but the difference was only statistically significant when comparing this region to

the South and the West. Figure 7 illustrates this variance in the mean ratios for each region.

Figure 7 - Descriptive Statistics: Student/Teacher RatioDependent Variable: Average pupil/teacher ratio Fall 2005

region MeanStd.

Deviation N

West 17.81 2.90 13

Midwest 14.81 1.71 12

South 14.86 1.28 17

Northeast 12.73 1.46 9

Total 15.23 2.54 51

Because the values for the standard deviation in each region are low, we can find a

practical significance in the data and expect sample states in the region to closely represent

the population mean for that region. However, this research does not account for support

staff and other certificated teacher positions that might influence the difference between

the theoretical student/teacher ratio and the actual average number of students in a

classroom. Further research is necessary to explore the impact of such educational

positions on this ratio for each region.

9


Before we focus on SAT scores, it is important to observe and analyze the

data regarding the percentage of eligible students taking the SAT tests in each region. An F

ratio of 16.66, which is greater than the critical value of 2.80 for F(3,47), indicates a highly

significant difference and may be one of the most crucial variables to consider when

making comparisons and inferences about SAT performance. The mean percentage of

students taking the SAT tests in the Northeast, as indicated in figure 8.1, was 81.44. The

means for the other three regions ranged from 12.67 in the Midwest to 40.35 in the South.

It is important to observe the standard deviation as well, which indicates a much smaller,

more predictable range for the Northeastern states, and greater fluctuation in the other

three regions.

Figure 8.1 - Descriptive Statistics: SAT EligibleDependent Variable: Percentage of all eligible students taking the SAT 2006-07

region MeanStd.

Deviation N

West 33.46 18.85 13

Midwest 12.67 16.75 12

South 40.35 30.89 17

Northeast 81.44 10.33 9

Total 39.33 31.12 51

A Dunnett C test, figure 8.2, shows statistical significance at the .05 level when

comparing the Northeast to the other three regions, and the difference of the means in

these comparisons is greater than any other comparisons. There is also statistical

significance when comparing the very low mean percentage of the Midwest region to the

South region. It is important to note the confidence intervals for this data as well, which

10


indicate a large range in which the mean difference between two regions is likely to fall, but

also clearly indicates a direction for that difference in most comparisons. We can observe

the most obvious and strongest of these relationships by looking at the confidence intervals

when comparing the northeast to any of the other region.

Figure 8.2 - Multiple Comparisons: SAT EligiblePercentage of all eligible students taking the SAT 2006-07Dunnett C


Mean Difference

(I-J)Std. Error


Lower Bound

Upper Bound

West Midwest 20.80 7.12 -.48 42.07

South -6.89 9.14 -33.35 19.57

Northeast -47.98* 6.26 -67.02 -28.95

Midwest West -20.80 7.12 -42.07 .48

South -27.69* 8.92 -53.59 -1.79

Northeast -68.78* 5.94 -87.03 -50.53

South West 6.89 9.14 -19.57 33.35

Midwest 27.69* 8.92 1.79 53.59

Northeast -41.09* 8.25 -65.17 -17.01

Northeast West 47.98* 6.26 28.95 67.02

Midwest 68.78* 5.94 50.53 87.03

South 41.09* 8.25 17.01 65.17*. The mean difference is significant at the .05 level.

The practical significance of this data lies in the impact that these percentages will

have on the generalizability of conclusions about SAT scores and any correlations involving

this data. This data raises significant questions about the South, the West, and especially

the Midwest regions in terms of whether or not the sample groups taking the SAT tests are

representative of the population.

11


Further research is needed to determine if the means being reported for regional

SAT scores are being skewed by the percentages and the high degree of variance between

regions as indicated in this Dunnett C test. Further research might focus the correlation

between an eligible student’s GPA and whether or not they take the SAT tests, or

percentages of SAT test takers and enrollment in higher education.

We can now focus on SAT scores in writing. We can observe from the data in figure

9.1 that the mean SAT score was highest in the Midwest with a value of 564.17 and lowest

in the Northeast with a value of 497.22. The West and the Midewest were clustered near

the mean total for the nation, which has a value of 525.37. It is important to note that the

standard deviation is much lower in the Northeast region, indicating a lesser amount of

variance between scores.

Figure 9.1 - Descriptive Statistics: SAT Scores in WritingDependent Variable: Average writing SAT score 2005-06

region MeanStd.

Deviation N

West 515.00 25.29 13

Midwest 564.17 31.04 12

South 520.82 39.19 17

Northeast 497.22 11.24 9

Total 525.37 37.63 51

Figure 9.2 shows a statistical significant at the .05 level when comparing the

Midwest with any of the other three regions. The mean difference ranges between

43.34 and 66.94, and indicates a higher statistical probability of greater scores in the

Midwest region. However, the confidence intervals for this data are expansive, indicating a

12


high degree of variance when determining the mean scores between states from different

regions.

Figure 9.2 - Multiple Comparisons: SAT Scores in WritingAverage writing SAT score 2005-06Dunnett C


Mean Differenc

e (I-J)Std. Error


Lower Bound

Upper Bound

West Midwest -49.17* 11.38 -83.24 -15.10

South -5.82 11.81 -40.07 28.42

Northeast 17.78 7.95 -6.24 41.80

Midwest West 49.17* 11.389 15.10 83.24

South 43.34* 13.06 5.06 81.63

Northeast 66.94* 9.71 37.43 96.46

South West 5.82 11.81 -28.42 40.07

Midwest -43.34* 13.06 -81.63 -5.06

Northeast 23.60 10.22 -6.10 53.30

Northeast West -17.78 7.95 -41.80 6.24

Midwest -66.94* 9.71 -96.46 -37.43

South -23.60 10.22 -53.30 6.10*. The mean difference is significant at the .05 level.

The practical significance of this data is questionable, based on the issues previously

raised regarding the percentages of eligible students taking the SAT tests in each region. It

is beyond the scope and limitations of this data to infer causation between these variables.

Any conclusions drawn from the initial report involving SAT scores may be distorted

significantly by a gross sampling error of the students taking SAT tests in various states and

regions. We take a closer look at correlations between these and other variables in Part III.

13


Part III – Correlation, Scatterplots, and Regression Equations

Here we look to understand correlation and analyze the relationship between

variables by looking at scatterplots and regression equations. This analysis continues to

focus on SAT scores as the dependent variable. The independent variables are

Expenditures, Student/Teacher Ratio, and the percentage of Eligible Students who take the

SAT tests. Each scatterplot shows a negative relationship, as indicated by the downward

slope of the line of best fit. The correlation coefficient will be measured for statistical

significance against the critical value at the .05 alpha level of 0.279, which coincides with

48 degrees of freedom - the closest value we can obtain to our actual calculated df value of

49. It is important to remember that statistical significance is determined by the

correlation coefficient as it approaches +/- 1.00, a perfect correlation. Even if a value is

negative, it can be greater than our critical value. It is also important to remember that we

are looking to make determinations about statistical and practical significance through

correlation, not looking to draw conclusions about causation. Let us look at our first

scatterplot.

4000 6000 8000 10000 12000 14000 16000 18000 200000

100200300400500600700

f(x) = − 0.00596941837135119 x + 587.023414437313

Expenditures

SAT

Scor

es

Figure 10 - Expenditures and SAT Scores in Writing

Regression Equation: y = 0.006x + 587.02Coefficient of Determination: r^2 = 0.1576Coefficient of Correlation: r = -0.3969

14


In figure 10, we see the slight negative correlation between Expenditures and SAT scores.

The slope is a -0.006. The y-intercept is 587.02. The correlation coefficient is -0.396 is greater than

our critical value of 0.279, indicating a statistical significance. This relationship is a weak because

the correlation coefficient has the low value. Most of the values on the scatterplot are clustered

between $8,000 and $12,000 and do not fall very close to the line of best fit. Though statistically

significant, Expenditures are not a strong, predictor of SAT Scores, and this correlation has limited

practical significance. Let us turn our attention to our second set of variables.

10 12 14 16 18 20 22 240

100

200

300

400

500

600

700

f(x) = − 0.987225596120366 x + 540.337727614915

Student/Teacher Ratio

SAT

Scor

es

Figure 11 - Student/Teacher Ratio and SAT Scores in Writing

Regression Equation: y = 0.9872x + 540.34Coefficient of Determination: r^2 = 0.0044 Coefficient of Correlation: r = -0.0662

In looking at the relationship between Student/Teacher Ratio and SAT Scores, we can see in

figure 11 that there is a slight negative correlation. The regression equation calculates the slope at -

0.987 and the y-intercept at 540.34. The slight downward direction of the line of best fit again

shows the negative relationship between these two variables. The correlation coefficient is

-0.066. Because this value is less than our critical value of 0.279, we accept the null hypothesis that

there is no statistical significance in the relationship. Let us look at our last scatterplot.

15


0 20 40 60 80 100 1200

100200300400500600700

f(x) = − 1.05161947499553 x + 566.736248369432

Eligible

SAT

Scor

esFigure 12 - Eligible and SAT Scores in Writing

Regression Equation: y = -1.0516x + 566.74Coefficient of Determination: r^2 = 0.7565Coefficient of Correlation: r = -0.8698

Figure 12 illustrates the negative correlation between the percentage of students

who graduate high school and take the SAT tests and performance on SAT tests in writing.

The correlation coefficient for this relationship is -0.870. This value is much greater than

our critical value of 0.279. It indicates a strong correlation between these two variables as

it approaches -1.00. As we look at the data values along the line of best fit, we can see that

they are either touching the line, or very close to it. The regression equation calculates the

slope to be -1.052 and the y-intercept to be 566.74.

Conclusions

There is a concerning degree of skewness to the variables evaluated in Part I. While

sample error is expected, this data shows enough of a departure from normality to suggest

that some of the sample means are not representative of the population. This skewness was

most obvious in the SAT scores. By evaluating the correlations through scatterplots in Part

III, we can see that there is a strong relationship between the percentage of eligible

students who take the SAT tests and student performance on the SAT writing tests. Further

16


research is needed to investigate this relationship. It is necessary to evaluate those states

who have a very low percentage of students taking the SAT tests, yet report a high mean

score on the SAT test in writing, to determine if this mean scores of the sample is

representative of the population, or if it is severely skewed data. In Part II, it was shown

that there was high degree of variability in the percentages of students in different states

and different regions who go on to take the SAT tests. This analysis of the data related to

SAT eligibility percentages and scores raises serious concerns about external validity that

requires further research and evaluation.

17

gavinmolitor.files.wordpress.com€¦ · Web viewRunning Header: EXCEL PROJECT. Excel Project: An...

Documents

Transcript of gavinmolitor.files.wordpress.com€¦ · Web viewRunning Header: EXCEL PROJECT. Excel Project: An...