Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 1 of 16
Chapter 2 Assignment (due Thursday, October 3)
Introduction: The purpose of this assignment is to analyze data sets by
creating histograms and scatterplots. You will use the STATDISK program for
both.
Therefore, you should start by downloading the STATDISK Version 13 program.
A link to the textbook website is given on the “Links and Information” page of our
class webpage, plus I did a demo in class.
To start the program, just double-click on the “Statdisk.exe” file (or icon), and it
should open looking approximately like this:
There are datasets preloaded into the program. To open them, just go to Data
Sets/Elementary Statistics 13th Edition, and pick a file. For example, I will
open up the ‘Bear Measurements’ dataset to use as an example throughout this
assignment file (Data Set no. 9). You might want to open this file and run
through it with the examples as practice.
Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 2 of 16
You can click and drag the corner of the data box to make the box bigger. Also,
if needed, you can scroll over to the rest of the columns using the scroll bar at the
bottom of the window (for example, there are 9 columns of data in this file).
There are two parts to the assignment:
Part A: Frequency Distributions and Histograms
Part B: Scatterplot
Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 3 of 16
Part A Frequency Distributions and Histograms
The data that you will be analyzing for this assignment deals with the literacy
rates (measurement unit: %) for 165 countries in the world. One set of data
applies to the Male population of each country, and the second set of data
applies to the Female population of each country. The rates generally apply to
people ages 15 and over, and the data is current as of different years for various
countries, but most estimates were from 2015.
The source of data is the CIA World Factbook, at the following website:
https://www.cia.gov/library/publications/the-world-factbook/
From the website, here is their definition of the data:
The data for the assignment is given in the Excel file: literacy_by_country_CIA
I provided three different sheets of data in the Excel file, and the tabs for each
sheet are located at the bottom of the spreadsheet. The data that you want to
copy over to STATDISK is on the “STATDISK” sheet. The other two sheets are
for your information only, and contain the respective data sorted in order from
smallest to largest percentages of literacy, along with the corresponding country
names. You will probably want to consult these sorted data lists when you are
answering the questions.
You are going to create two different frequency distributions and histograms for
this part of the assignment, using the data in the Excel file.
Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 4 of 16
You need to copy the data from the Excel file into STATDISK. Start STATDISK,
so all of the data columns should be empty. In Excel, make sure that you are on
the “STATDISK data” tab, the tabs are labeled at the bottom of the spreadsheet.
You can copy both columns of data over at the same time. All you want to copy
over are the numbers in columns B and C, you do not want to copy over the
country names or the column headers. Therefore, in the Excel file, click on cell
B2, which is the first data value (52). Keeping your mouse button down, drag all
the way down and over to cell C166, which is the last data value (84.6). Now
when you release the mouse button, the columns of numbers should be
highlighted. Then just hit Ctrl c (shortcut command for copying). So you hold the
Control button down on your keyboard, and hit the letter c at the same time.
Now transfer over to STATDISK, click in the very upper left cell (row 1 column 1),
and hit Ctrl v (shortcut command for pasting). Both columns of numbers should
now be pasted into STATDISK. Scroll down to confirm that you did get all 165
rows of numbers copied over in both columns. I would suggest that you go to
Data Tools/Edit Column Titles, and add appropriate column headers for the data
(male and female), just to keep them straight.
Making the Frequency Distributions and Histograms
In both cases, use the continuous data method to set up your data classes.
Use the following starting lower class limits and class widths for the
frequency distributions:
Dataset Starting Lower Class Limit
Class Width
Male Literacy Rates (Column 1) 20 10
Female Literacy Rates (Column 2) 10 10
Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 5 of 16
Fill in all of the details of the frequency distributions on the Answer Sheet given at
the end of the assignment. To set up the class limits, notice that both columns of
data are given to the first decimal place, so write your class limits accordingly.
Use the histogram (instructions below) to find the frequencies for each data
class, just by checking the ‘Bar labels’ button at the bottom of the histogram.
Disclaimer: I noticed that sometimes if you change something in the histogram
and replot it, the bar labels (frequencies) disappear. Just uncheck and recheck
the ‘Bar labels’ button and they should reappear.
If you wanted to, you can verify counting up the frequencies by hand (like we did
in class), by looking at the sorted data. If you need to sort data (smallest to
largest), go to Data Tools/Sort Data. Then select the Sort/One column, pick the
column number, leave the order from “A to Z” and hit the ‘Sort’ button. You don’t
need to do this in this case, since your histogram has counted them up for you
(plus I already sorted them in Excel, given on separate sorted tabs of the
spreadsheet), but it’s handy to know how to sort the data in STATDISK, and you
may want to use this in the future for quizzes or assignments. Also, you will need
to know how to do this process by hand on the test.
Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 6 of 16
To create the
histograms, go to
Data/Histogram. Select
the column of data that
you want to plot, and
then hit the ‘Plot’ button.
STATDISK will auto-fit
the plot options for you,
and generate a
histogram. For
example, I am plotting
the head length of the
bears (column 4).
In most cases, you will
want to define your own
plot options, including
the Class width and the
Class start (starting
lower class limit). To do
this, click on the “User
defined” button. Enter
the correct class width
and starting lower class
limit (which I specified
above for this assignment), and hit ‘Plot’. For example, for the bear’s head
length histogram, I will change the starting lower class limit to 4.
Note: anytime you change the input options for the histogram (class width
or class start), you need to hit the Plot button at the top again!
Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 7 of 16
Finally, for all histograms,
you need to add a title
and an x-axis label,
including any
appropriate units of
measurement. Use the
‘Title’ and ‘x Label’ boxes
at the upper right
Now you should have a
nice histogram with the
title and x-axis label
displayed. For example
(see above):
Print out each of the histograms. Do NOT use the ‘Print’ or ‘Copy’ button at
the bottom of the STATDISK histogram box. The problem is that these
commands only copy the plot itself (the part with the white background), and not
the column on the right showing the input plot options. I want to see these!
Instead, you can use an alternative method, especially if you want to print more
than one graph per page. This method is described separately in the file
“Printing Out Applet Output”, which is on the Assignment webpage. Wherever it
says “applet”, just think “STATDISK” instead, it is exactly the same. I will
demonstrate this in class also, if I haven’t already done so.
Finish this portion of the assignment by answering the questions given on the
Answer Sheet.
Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 8 of 16
Part B Scatterplots
You are going to create one scatterplot.
The purpose of making scatterplots is to investigate whether or not there is a
relationship between the two variables. Where applicable, you will confirm a
linear relationship (or not) by running a linear correlation and regression analysis.
From the bear data, for example,
here is a plot of a bear’s weight as
a function of its chest
measurement. From the graph,
there appears to be a strong linear
relationship between the two
variables, since the data points
approximately form a line.
We can confirm this by running a
Correlation and Regression
analysis in STATDISK.
Go to Analysis/Correlation and
Regression, select the correct
columns for the x variable and
the y variable, and then hit the
‘Evaluate’ button.
In this case, the calculated correlation coefficient r = 0.963, which indicates a
very strong positive linear correlation. It is much higher than the critical r = 0.268,
which confirms the linear correlation.
Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 9 of 16
On the other hand, this
plot of the bears weight as
a function of month of
measurement does NOT
seem to show a strong
linear correlation
between the two
variables. The data is
scattered about in no
particular pattern. Which
makes sense, because it
is the weight plotted as a
function of the month in
which the measurement was taken. There’s no reason for these two variables to
be correlated!
The correlation and
regression analysis
confirms that there is no
linear correlation, since
the calculated correlation
coefficient (r = 0.171) is
less than (closer to 0 than)
the critical r (critical r = 0.268).
Also note – a scatterplot may reveal that there IS a relationship between the two
variables, but that it is clearly NOT a linear relationship. In this case, it would
NOT be appropriate to run a linear correlation and regression analysis, because
a visual inspection of the data has already shown that the data is clearly not
linear.
Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 10 of 16
Scatterplot: Female Literacy Rates as a Function of Male Literacy Rates for
World Countries
You are going to make a scatter plot of the paired data (male, female) or (x, y) for
each country in the table. In other words, you will plot female literacy rates as a
function of male literacy rates (both in %). The point is to investigate if there a
relationship between the two variables.
Use the same two columns of data that you already copied over from Excel.
To create the scatterplot, go to Data/Scatterplot. Select the two columns of data
you want to plot, then hit the ‘Plot’ button. Note that you want to choose ‘Female’
(which is in Excel column C and STATDISK column 2) as the Y Value variable,
and ‘Male’ (which is in Excel column B and STATDISK column 1) as the X Value
variable (the independent variable). UNCHECK the “Visible” box shown at the
bottom of the plot (see figure on previous page) to get rid of the green line.
The program will automatically plot a green regression line whether or not there
is a linear correlation, so it can be very deceiving. I noticed that I sometimes had
to check and uncheck the box a couple of times to get rid of the line, so just play
around with that.
Add a title and both an x- and y-axis label every time you make a scatterplot. I
noticed that when I typed them in, I had to immediately hit the ‘Enter’ key on my
keyboard, and then it would appear. Make sure that your title correctly reflects
which variable is a function of the other variable. Also always make sure that
you have included the correct units of measurement for both the x and y axes. In
this case, the units of measurement for both variables is percent (%).
Print out the scatterplot, using one of the methods described in “Printing Out
Applet Output”.
Math 146, Fall 2019 Instructor – Linda C. Stephenson Chapter 2 Assignment
Page 11 of 16
Run a correlation and regression analysis on the two variables, as described
above. Snip and save the output for printing.
CAUTION! NEVER SORT the data when you are making scatterplots! The
scatterplots consist of paired (x, y) data, so if you sort the columns individually,
then the pairs will no longer exist, and the data will be meaningless when you
plot it.
WHAT YOU NEED TO TURN IN:
1. Printout of two histograms. Don’t forget to add a title and an x-axis label,
including the measurement units where applicable. The units in this case
are just percent (%).
2. Printout of the scatterplot. Don’t forget to include a title and x- and y-axis
labels, including the measurement units.
3. Printout of one correlation and regression analysis.
4. Fully completed Answer Sheet (5 pages total).
Note: you do NOT need to turn in all the pages of instructions.
Math 146, Fall 2019 Name: Chapter 2 Assignment
Page 12 of 16
Answer Sheet Part A Frequency Distributions and Histograms Frequency Distribution for: Male Literacy Rates Note: Write the data classes as: lower class limit – upper class limit, to the first decimal place level of accuracy. Given: Starting lower class limit = 20 Class width = 10 Fill in ALL of the boxes:
Data Classes
(% literate)
Frequency (Count)
Relative Frequency* (%)
* Note: calculate the relative frequencies (in %) to TWO decimal places. Do
NOT just use the values from the STATDISK histogram, because they are
rounded off to the nearest integer.
Math 146, Fall 2019 Name: Chapter 2 Assignment
Page 13 of 16
Frequency Distribution for: Female Literacy Rates
Note: Write the data classes as: lower class limit – upper class limit, to the first decimal place level of accuracy. Given: Starting lower class limit = 10 Class width = 10 Fill in ALL of the boxes:
Data Classes
(% literate)
Frequency (Count)
Relative Frequency* (%)
* Note: calculate the relative frequencies to TWO decimal places. Do NOT just
use the values from the STATDISK histogram, because they are rounded off to
the nearest integer.
Math 146, Fall 2019 Name: Chapter 2 Assignment
Page 14 of 16
Part A Questions: Histograms 1. a. Describe the distribution (shape) of the data for the male literacy
histogram. b. Describe the distribution (shape) of the data for the female literacy
histogram. 2. Note that the histograms have a similar shape. WHY do you think the
distributions are shaped like this, what could be some possible explanations? Consider the underlying causes, not the “mechanics” of the histograms. In particular, you might want to look at the sorted lists of data in Excel, and consider which countries are on which end of the graph.
3. There is one particular difference between the literacy rates for males and
females in countries of the world. Look carefully at the histograms and notice that the scales are different for both the x-axis, so account for that when you are comparing them. Who tends to have higher rates of literacy, males or females, and where (what parts of the world) are the differences the most pronounced?
Math 146, Fall 2019 Name: Chapter 2 Assignment
Page 15 of 16
4. Refer back to the difference that you identified in question 3 between the literacy rates of males and females. Do a little research to investigate specifically why this difference exists, and describe your results here. Also, list your source(s) of information below. Feel free to type your answer and attach if that is more convenient.
Source: (list website link):
Math 146, Fall 2019 Name: Chapter 2 Assignment
Page 16 of 16
Part B Scatterplot 5. Look at the scatter plot of Female literacy rates as a function of Male
literacy rates.
From the graph, does there appear to be a linear relationship between the two variables? (yes or no)
If there does appear to be a linear relationship, how STRONG does the relationship appear to be?
Does the correlation and regression analysis indicate that there is a linear relationship between the two variables? (yes or no)
List: Calculated r =
Critical r =
Briefly explain why you think this is the case. (Not why you came to the conclusion based on the plot, but what the underlying cause may be for the relationship or lack thereof.) Why is there or is there not a relationship between the male and female literacy rates for each country in the world?
Top Related