Post on 30-Sep-2020
Jennifer Porter
Math 1040 – Summer 2017
Term Project
July 28, 2017
Skittles Candies Project
The following project will consist of many of the concepts I have learned over the course of this Math
1040 class. Some of the concepts I will be covering will include organizing and analyzing data, drawing
conclusions using confidence intervals and hypothesis tests. In the information below you will see the
collection of my own data as well as the overall class sample data. My data was collected along with all of
the other students in class, and combined into one data set that will be used throughout this project.
Count Red
Count Orange
Count Yellow
Count Green
Count Purple
Total
My Bag 13 19 17 7 8 64 Class Counts
340 370 363 346 342 1761
The pie charts above do reflect mainly what I was expecting to see in the results comparing my data to
the class data. Because the colors orange and yellow were so high in my own bag of Skittles I figured they
would be amongst the highest overall in the class as well. My hypothesis was correct there, however, I also
assumed that red would also be one of the most popular colors and I was surprised with the results; they were
the lowest overall color amongst the class totals. The two colors I assumed would be the least represented in
the bags were purple and green, and I was correct in that assumption. You can see in the Pareto charts below
that the overall class sample is fairly even between the five different colors, with red being the lowest and
orange being the highest color in the bags of Skittles.
Summary Statistics for the Total Candies in Each Bag:
Mean is 60.7
Standard Deviation is 3.3
5-number Summary is 55, 58, 61, 62, 70; Minimum= 55, Q1=58, Median=61, Q3=62, Maximum=70
1. In the above histogram and boxplot, it shows the distribution of the total candies per bag for the
classroom sample. In both graphs, you can see how the shape of the distribution is skewed to the right.
There was one bag of skittles that had 70 candies in it which is higher than all of the other bags, and
makes it an outlier for the class sample. This bag of 70 candies pulled the mean to the right of the graph.
The mean is 60.7 which you can clearly see on the histogram that it is where the majority of the total
number of candies per bag is for the classroom. I did not expect to see a gap in the histogram between 66
and 70, but that is another way to show that the class does represent a random sample. The classroom
data sample is fairly close but does not agree with my own Skittles candy bag. My bag had 64 candies in
it which is more than most of the class. I was also the only one to have 64 candies in their bag, which I
found interesting as well. My bag of 64 candies is closest to the third quartile, but it was still two away
from that.
2. Categorical variables are also known as qualitative variables. They are variables that are not numerical
and describe data that fit into categories. One example would be eye colors and the variables would
include blue, green, brown, and hazel. Qualitative data is data that can be ordered and measured. The
number of candies in a bag of skittles is quantitative, but the color of the actual candy is categorical or
qualitative. Pareto Charts, bar graphs, and pie charts would be the best way to graph categorical or
qualitative data. The best way to graph quantitative data is with histograms, stem leaf plots, bar graphs,
boxplots, or dot plots. All of these graphs can be used to measure the quantity of a certain variable. A
bar graph could work for both quantitative and qualitive data, however, a pie chart only makes sense for
qualitative data since it is comparing categories to the whole. It would effectively show the percentage
of each color of skittles in a bag (qualitative data), but not the number of skittles in a bag (quantitative
data).
Calculation wise, mean and median only make sense for quantitative data. The mean represents
the average quantity of something in an entire sample, just like the average mean for the class
sample in this project is 60.7, therefore it is better applied to quantitative data. The median
represents the middle value of a data (in this case the median is 61) and also makes the most
sense when applied to quantitative data. The mode would be the best central tendency to use with
categorical data or qualitative data. You may not be able to find the average color or the median
color of the candy in a Skittles bag, but you could establish which color occurs most often.
Confidence Interval
A confidence interval is a range (or an interval) of values used to estimate the true value of a population
parameter. Statisticians use a confidence interval to describe the amount of uncertainty associated with a sample
estimate of a population parameter. The range of the confidence interval is defined by the sample statistic plus
or minus the margin of error (confidence interval=sample statistic+ Margin of error). What a confidence
interval does is allows us to estimate the range in which our true population parameter falls, given what we
know about the population from the sample we’ve observed. Because no estimate can be 100 percent reliable,
we must be able to know how confident the estimates are. For example if repeated samples were taken and the
95% confidence interval was determined for each sample, 95% of the intervals would contain the population
mean. Confidence intervals can be computed for various parameters, not just the mean.
1.)
n=1761
x=363
p=363/1761=0.206
za/2=2.579
99%confidencethatthepopulationproportionofyellowcandiesisbetween0.181and0.231
2.)
x=60.7
s=3.3
n=29
ta/2=2.048
95%confidencethatthepopulationmeanfornumberofcandiesperbagisbetween59.4and61.9
ReflectiveWriting
ThroughoutthisclassIhavelearnedmanynewconceptsinstatistics.Istillrememberlookingattheinstructions
forthisprojectandthinkingtomyselfthatIwasreadingaforeignlanguage.Inthebeginningitseemedlikeitwouldbea
verydifficultproject.However,astheclasswentonandIbegantounderstandtheconceptsfromeachmodulebetterI
realizedthatIwasnotonlycapableofgraduallyunderstandingtheinstructions,butIwasalsoabletoperformthe
correctsequenceofstepsforeachoftheexercisesintheproject.Wehadtocreatearandomsampleofdata,havethat
dataorganized,creategraphsandchartsandinterpretwhattheinformationmeant.Asinanyotherclass,afterlearning
thetheory,practicemakesthedifference.
Thisprojectallowedustoputintopracticekeyprinciplesstudiedthroughouttheterm—fromusingasampling
methodtoperforminghypothesistesting.Themostchallengingaspectsoftheprojectwerereallyunderstandingeach
conceptandhowitappliedtothepopulationofSkittlesnotjustoursample.Thisprojecthasgivenmeabetter
understandingofstatisticsandhowitworksintherealworld.Statisticsprovidesvaluableinformationtosociety.SinceI
workinthefieldofGISIhavefoundthisclassandstatisticsverybeneficialtomyjob.
Thisprojectandtheclassingeneralhavegivenmethetoolstodifferentiatebetweenvalidandquestionable
informationwhenanalyzingthingslikegraphs,andconfidencelevelsandintervals.Tobeabletounderstandthe
languagebehindthestatisticalanalysisofstudieswithsimple,butimportanttermssuchasmedium,range,mean,and
modethataresofrequentlyusedinsomanyinstances.
ThroughoutthiscourseIhavelearnedhowtoapplystatisticaltechniquesinadailylifesituationaswellas
learnedhowtouseagreatprogram,Statcrunch,tocreatevisualrepresentationofthedata.Iwasabletocreategraphs
andfindstatisticaldatabysimplyimputingthedatagiven.Ihavealwaysstruggledwithmathbutfoundthatthiscourse
allowedmetoseehowmathdoesapplytoreallife.