Sample vs. Population
Population Sample
Descriptive Statistics
• Organize
• Summarize
• Simplify
• Presentation of
data
Describing data
Descriptive Statistics
Descriptive Statistics are Used by Researchers
to Report on Populations and Samples
Descriptive Statistics
Types of descriptive statistics:
Organize Data
Tables
Graphs
Summarize Data
Central Tendency
Variation
Descriptive Statistics
Types of descriptive statistics:
Organize Data
Tables
Frequency Distributions
Relative Frequency Distributions
Graphs
Bar Chart or Histogram
6
Frequency Distributions
After collecting data, the first task for a researcheris to organize and simplify the data so that it ispossible to get a general overview of the results.
This is the goal of descriptive statistical techniques.
One method for simplifying and organizing data is to construct a frequency distribution.
7
Frequency Distributions
A frequency distribution is an organized
tabulation showing exactly how many individuals
are located in each category on the scale of
measurement. A frequency distribution presents
an organized picture of the entire set of scores,
and it shows where each individual is located
relative to others in the distribution.
8
Frequency Distribution Tables
A frequency distribution table consists of at least two columns - one listing categories on the scale of measurement (X) and another for frequency (f).
In the X column, values are listed from the highest to lowest, without skipping any.
For the frequency column, tallies are determined for each value (how often each X value occurs in the data set). These tallies are the frequencies for each X value.
The sum of the frequencies should equal N.
9
Frequency Distribution Tables
A third column can be used for the proportion (p) for each category: p = f/N. The sum of the p column should equal 1.00.
A fourth column can display the percentage of the distribution corresponding to each X value. The percentage is found by multiplying p by 100. The sum of the percentage column is 100%.
A frequency distribution is a tabular summary ofdata showing the frequency (or number) of itemsin each of several non overlapping classes.
The objective is to provide insights about the datathat cannot be quickly obtained by looking only atthe original data.
Frequency Distribution
Example: Scores of students from
petroleum engineering department
Scores of 20 students are listed below
Below Average
Above Average
Above Average
Average
Above Average
Average
Above Average
Average
Above Average
Below Average
Low
Excellent
Above Average
Average
Above Average
Above Average
Below Average
Low
Above Average
Average
Average
Frequency Distribution
Low
Below Average
Average
Above Average
Excellent
2
3
5
9
1
Total 20
Scores Frequency
The relative frequency of a class is the fraction orproportion of the total number of data itemsbelonging to the class.
A relative frequency distribution is a tabularsummary of a set of data showing the relativefrequency for each class.
Relative Frequency Distribution
Percent Frequency Distribution
The percent frequency of a class is the relativefrequency multiplied by 100.
A percent frequency distribution is a tabularsummary of a set of data showing the percentfrequency for each class.
Relative Frequency andPercent Frequency Distributions
Low
Below Average
Average
Above Average
Excellent
.10
.15
.25
.45
.05
Total 1.00
10
15
25
45
5
100
Relative
Frequency
Percent
FrequencyScores
.10(100) = 10
1/20 = .05
16
Frequency distribution graphs
Frequency distribution graphs are useful because they show the entire set of scores.
At a glance, you can determine the highest score, the lowest score, and where the scores are centered.
The graph also shows whether the scores are clustered together or scattered over a wide range.
17
Frequency Distribution Graphs
In a frequency distribution graph, the
score categories (X values) are listed on the
X axis and the frequencies are listed on the Y
axis.
Bar Graph
A bar graph is a graphical device for depictingqualitative data.
On one axis (usually the horizontal axis), we specifythe labels that are used for each of the classes.
A frequency, relative frequency, or percent frequencyscale can be used for the other axis (usually thevertical axis).
Using a bar of fixed width drawn above each classlabel, we extend the height appropriately.
The bars are separated to emphasize the fact that eachclass is a separate category.
Low BelowAverage
Average AboveAverage
Excellent
Fre
qu
en
cy
Rating
Bar Graph
1
2
3
4
5
6
7
8
9
10Scores of students
Pie Chart
The pie chart is a commonly used graphical devicefor presenting relative frequency distributions forqualitative data.
First draw a circle; then use the relative
frequencies to subdivide the circle
into sectors that correspond to the
relative frequency for each class.
Since there are 360 degrees in a circle,
a class with a relative frequency of .25 would
consume .25(360) = 90 degrees of the circle.
BelowAverage
15%
Average25%
AboveAverage
45%
Low10%
Excellent5%
Scores of students
Pie Chart
Grouped Frequency Distribution
23
Grouped Frequency Distribution
Sometimes, however, a set of scores covers a
wide range of values. In these situations, a list of
all the X values would be quite long - too long to
be a “simple” presentation of the data.
To remedy this situation, a grouped frequency
distribution table is used.
24
Grouped Frequency Distribution
In a grouped table, the X column lists groups of
scores, called class intervals, rather than individual
values.
These intervals all have the same width, usually a
simple number such as 2, 5, 10, and so on.
Each interval begins with a value that is a multiple of
the interval width. The interval width is selected so
that the table will have approximately ten intervals.
Example: The following Table shows the permeability values that
were taken from 50 wells in a case study.
Sample of permeability values for 50 Samples
91 78 93 57 75 52 99 80 97 62
71 69 72 89 66 75 79 75 72 76
104 74 62 68 97 105 77 65 80 109
85 97 88 68 83 68 71 69 67 74
62 82 98 101 79 105 79 69 62 73
Including a line in the table for every
Sample is not a good idea.
Need to categorize.
Frequency Distribution
Guidelines for Selecting Number of
Classes
• Use between 5 and 20 classes.
• Data sets with a larger number of elementsusually require a larger number of classes.
• Smaller data sets usually require fewer classes
Frequency Distribution
Guidelines for Selecting Width of Classes
Largest Data Value Smallest Data Value
Number of Classes
•Use classes of equal width.
•Approximate Class Width =
Summarizing Quantitative Data
Frequency Distribution
Relative Frequency and Percent
Frequency Distributions
Histogram
Cumulative Distributions
Frequency Distribution
For permeability values , if we choose six
classes:
50-59
60-69
70-79
80-89
90-99
100-109
2
13
16
7
7
5
Total 50
permeability Frequency
Approximate Class Width = (109 - 52)/6 = 9.5 10
Relative Frequency and Percent Frequency
Distributions
50-59
60-69
70-79
80-89
90-99
100-109
permeability
.04
.26
.32
.14
.14
.10
Total 1.00
Relative
Frequency
4
26
32
14
14
10
100
Percent
Frequency
2/50 .04(100)
Pre
vie
w c
um
ula
tive
freq
uen
cies
her
e.
• Only 4% of the permeability values are in the 50-59 class.
• The greatest percentage (32% or almost one-third)of the permeability values are in the 70-79 class.
• 30% of the permeability values are under 70.
• 10% of the permeability values are 100 or more.
Insights Gained from the Percent Frequency
Distribution
Relative Frequency and
Percent Frequency Distributions
32
Relative frequency
Many populations are so large that it is
impossible to know the exact number of
individuals (frequency) for any specific
category.
In these situations, population distributions
can be shown using relative frequency
instead of the absolute number of individuals
for each category.
2007©BOLD Educational Software
Ages f Relative Freq.
10 up to 19 2 2/6020 up to 29 1 1/6030 up to 39 5
40 up to 49 20
50 up to 59 25
60 up to 69 3
70 up to 79 4
Total 60 1
Practice: Determine the Relative Frequency Distribution
34
Frequency Distribution Graphs
In a frequency distribution graph, the
score categories (X values) are listed on the
X axis and the frequencies are listed on the Y
axis.
When the score categories consist of
numerical scores from an interval or ratio
scale, the graph should be either a histogram
or a polygon.
35
Histograms
In a histogram, a bar is centered above each
score (or class interval) so that the height of
the bar corresponds to the frequency and the
width extends to the real limits, so that
adjacent bars touch.
Histogram
Another common graphical presentation ofquantitative data is a histogram.
The variable of interest is placed on the horizontalaxis.
A rectangle is drawn above each class interval withits height corresponding to the interval’s frequency,relative frequency, or percent frequency.
Unlike a bar graph, a histogram has no naturalseparation between rectangles of adjacent classes.
In informal discussions bar graphs and histograms
are often equated. In this class you should be
careful to keep them straight.
Histogram
2
4
6
8
10
12
14
16
18
permeability
Fre
qu
en
cy
5059 6069 7079 8089 9099 100-110
permeability values
39
Frequency distribution graphs
Frequency distribution graphs are useful because they show the entire set of scores.
At a glance, you can determine the highest score, the lowest score, and where the scores are centered.
The graph also shows whether the scores are clustered together or scattered over a wide range.
Cumulative frequency distribution shows thenumber of items with values less than or equal tothe upper limit of each class..
Cumulative relative frequency distribution – showsthe proportion of items with values less than orequal to the upper limit of each class.
Cumulative Distributions
Cumulative percent frequency distribution – showsthe percentage of items with values less than orequal to the upper limit of each class.
Cumulative Distributions
permeability values
< 59
< 69
< 79
< 89
< 99
< 109
permeabilityCumulativeFrequency
CumulativeRelative
Frequency
CumulativePercent
Frequency
2
15
31
38
45
50
.04
.30
.62
.76
.90
1.00
4
30
62
76
90
100
2 + 13 15/50 .30(100)
permeability
20
40
60
80
100
Cu
mu
lati
ve
Per
cen
t F
req
uen
cy
50 60 70 80 90 100 110
(89.5, 76)
Cumulative Percent Frequencies
permeability values
Data analysis using SPSS
(Statistical package for social science)
Descriptive Statistics
Class A--IQs of 13 Students
102 115
128 109
131 89
98 106
140 119
93 97
110
Class B--IQs of 13 Students
127 162
131 103
96 111
80 109
93 87
120 105
109
An Illustration:
Which Group is Smarter?
Each individual may be different. If you try to understand a group by remembering the qualities of each member, you become overwhelmed and fail to understand the group.
SPSS Output for Frequency Distribution
IQ
1 4.2 4.2 4.2
1 4.2 4.2 8.3
1 4.2 4.2 12.5
2 8.3 8.3 20.8
1 4.2 4.2 25.0
1 4.2 4.2 29.2
1 4.2 4.2 33.3
1 4.2 4.2 37.5
1 4.2 4.2 41.7
1 4.2 4.2 45.8
1 4.2 4.2 50.0
1 4.2 4.2 54.2
1 4.2 4.2 58.3
1 4.2 4.2 62.5
1 4.2 4.2 66.7
1 4.2 4.2 70.8
1 4.2 4.2 75.0
1 4.2 4.2 79.2
1 4.2 4.2 83.3
2 8.3 8.3 91.7
1 4.2 4.2 95.8
1 4.2 4.2 100.0
24 100.0 100.0
82.00
87.00
89.00
93.00
96.00
97.00
98.00
102.00
103.00
105.00
106.00
107.00
109.00
111.00
115.00
119.00
120.00
127.00
128.00
131.00
140.00
162.00
Total
Valid
Frequency Percent Valid Percent
Cumulative
Percent
Frequency Distribution
Frequency Distribution of IQ for Two Classes
IQ Frequency
82.00 1
87.00 1
89.00 1
93.00 2
96.00 1
97.00 1
98.00 1
102.00 1
103.00 1
105.00 1
106.00 1
107.00 1
109.00 1
111.00 1
115.00 1
119.00 1
120.00 1
127.00 1
128.00 1
131.00 2
140.00 1
162.00 1
Total 24
Relative Frequency Distribution
Relative Frequency Distribution of IQ for Two Classes
IQ Frequency Percent Valid Percent Cumulative Percent
82.00 1 4.2 4.2 4.2
87.00 1 4.2 4.2 8.3
89.00 1 4.2 4.2 12.5
93.00 2 8.3 8.3 20.8
96.00 1 4.2 4.2 25.0
97.00 1 4.2 4.2 29.2
98.00 1 4.2 4.2 33.3
102.00 1 4.2 4.2 37.5
103.00 1 4.2 4.2 41.7
105.00 1 4.2 4.2 45.8
106.00 1 4.2 4.2 50.0
107.00 1 4.2 4.2 54.2
109.00 1 4.2 4.2 58.3
111.00 1 4.2 4.2 62.5
115.00 1 4.2 4.2 66.7
119.00 1 4.2 4.2 70.8
120.00 1 4.2 4.2 75.0
127.00 1 4.2 4.2 79.2
128.00 1 4.2 4.2 83.3
131.00 2 8.3 8.3 91.7
140.00 1 4.2 4.2 95.8
162.00 1 4.2 4.2 100.0
Total 24 100.0 100.0
Grouped Relative Frequency Distribution
Relative Frequency Distribution of IQ for Two Classes
IQ FrequencyPercent Cumulative Percent
80 – 89 3 12.5 12.5
90 – 99 5 20.8 33.3
100 – 109 6 25.0 58.3
110 – 119 3 12.5 70.8
120 – 129 3 12.5 83.3
130 – 139 2 8.3 91.6
140 – 149 1 4.2 95.8
150 and over 1 4.2 100.0
Total 24 100.0 100.0
SPSS Output for Histogram
80.00 100.00 120.00 140.00 160.00
IQ
0
1
2
3
4
5
6F
req
ue
nc
y
Mean = 110.4583Std. Dev. = 19.00338N = 24
Histogram
80.00 100.00 120.00 140.00 160.00
IQ
0
1
2
3
4
5
6
Fre
qu
en
cy
Histogram of IQ Scores for Two Classes
Bar Graph
1.00 2.00
Class
0
2
4
6
8
10
12
Co
un
t
Bar Graph of Number of Students in Two Classes
Top Related