3-Statistical Description of Data [Compatibility Mode]
Transcript of 3-Statistical Description of Data [Compatibility Mode]
![Page 1: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/1.jpg)
8/27/2011
1
Chapter 3.
Frequency Distributions
STATISTICAL DESCRIPTIONOF DATA
Grouped Data
Percentiles, Deciles & Quartiles
Graphical Representations
Symmetry and Skewness
Objectives
Set up a frequency distribution for a mass of data.
Calculate the mean median and modeCalculate the mean, median and mode for grouped data.
Calculate and interpret other measures of location like the deciles, quartiles & percentiles.
Calculate the standard deviation, variance, mean deviation and quartile deviation for
d d tgrouped data.
Construct histograms, bar charts, frequency polygons, pie charts and ogives.
Describe a given set of data in terms of skewness and kurtosis.
Statistical data collected should be arranged in such a manner that will allow a reader to distinguish their
essential features. Depending on the type and the objectives of the person presenting the information, data may
be presented using one or a combination of three forms.
![Page 2: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/2.jpg)
8/27/2011
2
Three Forms of Presenting Data
Textual Form – data ispresented in paragraphfform especially whenthey are purelyqualitative or when veryfew numbers areinvolved.
Tabular Form - data is presented in rows and columns
Graphical Form - data is presented in visual form
0
500
1000
1500
2000
2500
3000
3500
4000
Janu
ary
Febru
ary
Mar
chApr
ilM
ayJu
ne July
Augus
t
Septe
mbe
r
Octob
er
Novem
ber
Decem
ber
1991
1992
1993
1994
1995
When that data include a large number of observations, it is
convenient to group the values intoconvenient to group the values into mutually exclusive classes and show the number of observations occurring
in each class in a tabular form.
Frequency Distribution
A frequency distribution is thearrangement of data that shows thefrequency of occurrence of valuesfalling within arbitrarily defined rangesof the variable known as classintervals. The smallest and largestvalues that fall in a given interval arecalled class limits.
![Page 3: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/3.jpg)
8/27/2011
3
Class Frequency and Class Mark
Class frequency refers to the number of observations falling in anumber of observations falling in a particular class while the midpoint
between the upper and lower class limits is called class mark/midpoint.
Steps in Making a Frequency Distribution
Find the range.
Determine the interval size by dividing the range by the desired number of classes which is normally not less than 10 and not
hmore than 20.
Determine the class limits of the class intervals. Tabulation is facilitated if the lower class limits of the class intervals are multiples of the class size. The bottom interval must include the lowest score.
List the intervals, beginning at the bottom.
Tally the frequencies Tally the frequencies.
Summarize these under a column labeled f.
Total this column and record the number at the bottom.
Problem:
Construct a frequency distribution of the given scores on a test.
56 28 42 56 47 39 62 60 54 4756 28 42 56 47 39 62 60 54 47
78 82 55 56 41 44 54 42 62 48
62 38 57 55 50 47 42 56 68 53
37 72 65 66 52 52 48 48 42 68
![Page 4: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/4.jpg)
8/27/2011
4
Solution:
Computing for the range:
R = 82 – 28 = 54
C ti f th l i t lComputing for the class interval:
Therefore, class interval may be 5 or 6.
4.510
54i
We choose 5 because it is the odd number.
If i = 5, lowest limit should be
We choose 25 because it is the smallest multiple of the chosen interval which is smaller than the smallest value in the set
25.
smaller than the smallest value in the set.
If lowest limit is 25, the bottom interval should be 29 – 25.
The interval 29 - 25 contains the lowest score (28).
Classes
59 - 55
64 - 60
69 - 65
74 - 70
79 - 7584 - 80
Tally f
1
1
1
4
4
7
/
/
/
////
////
///////
29 - 25
34 - 30
39 - 35
44 - 40
49 - 45
54 - 50
40fN
6
6
6
3
0
1
//////
//////
//////
///
/
For Grouped Data ( > 30 values)
MEASURES OF CENTRAL TENDENCY
MEANMethods :1. Midpoint Method2. Short Method
MEAN
![Page 5: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/5.jpg)
8/27/2011
5
Midpoint Method
After the f column, make another column and enter the midpoint (Xm) of each class. Multiply the frequency with the midpoint and enter it in the
next column Label the column f X Get the sumnext column. Label the column f Xm. Get the sum. Use the formula:
N
fXx m )(
Short Method
Choose a class at or near the middle of thedistribution to be designated as the origin. Afterthe f column, construct the deviation column (d).Mark the chosen class zero. In succession, write-1, -2 and so on for classes lower in value thanthe origin. In like manner, write 1, 2, 3 and so onfor classes greater in value than the origin.Construct f x d column and get the algebraicsum.
Use the formula:
fxd )(i
N
fxdzx a lg)(
where z = midpt. of class chosen as origin
Problem:
For the given frequency distribution
Classes f
54-50 4
49-45 7frequency distribution, compute for the mean
using:
Midpoint MethodShort Method
49 45 7
44-40 12
39-35 10
34-30 9
29-25 6
24-20 2
![Page 6: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/6.jpg)
8/27/2011
6
Solution:
N
fXx m )(
1905
Classes
54-50
49-45
44 40
f
4
7
12
Xm
52
47
42
fXm
208
329
504
Using Midpoint Method
N = 50 1905mfX
50
1905x
1.38x
44-40
39-35
34-30
29-25
24-20
12
10
9
6
2
42
37
32
27
22
504
307
288
162
44
Using Short Method
Classes
54-50
49-45
44 40
f
4
7
12
d
3
2
1
fd
12
14
12
iN
fxdzx a lg)(
)5(11
37
N = 50 11fd
44-40
39-35
34-30
29-25
24-20
12
10
9
6
2
1
0
-1
-2
-3
12
0
-9
-12
-6
)5(50
37 x
1.38x
Find
MEDIAN
NSteps:
2
Find the accumulated sum of the frequencies up to the sum that contains
2
N
Use the formula:
if
cfNLMd
)2(
f
where L = lower limit of class which contains N/2f = frequency of class containing N/2
cf = cumulative sum that approaches or is equal to N/2
![Page 7: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/7.jpg)
8/27/2011
7
MODE
Rough Mode( R. Mo) - obtained by inspection and is equal to the p qXm of class having the highest frequency.
xMd 23 Theoretical Mode( T. Mo)
Problem:
For the given frequency distribution in the previous problem, compute for the:problem, compute for the:
MedianR. ModeT. Mode
Computing for the MedianSolution:
252
50
2
NClasses
54-50
49-45
44 40
f
4
7
12
cf i = 5
cfN )(
N = 50
44-40
39-35
34-30
29-25
24-20
12
10
9
6
2
27
17
8
2
if
cfNLMd
)2(
Md )5(10
)1725(35
39Md
Computing for the Mode
Classes
54-50
49-45
44 40
f
4
7
12
R. Mode = 42
xMdModeT 23.
N = 50
44-40
39-35
34-30
29-25
24-20
12
10
9
6
2
since 1.38x39Md
)1.38(2)39(3. ModeT
8.40. ModeT
![Page 8: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/8.jpg)
8/27/2011
8
Other Measures of Position
QuartilesDecilesPercentiles
Quartiles - those which divide the distribution into 4 parts
if
cfkNLQk
)4(
Deciles - those which divide the distribution into 10 parts
if
cfkNLDk
)10(
Percentiles - those which divide the distribution into 100 parts
if
cfkNLPk
)100(
![Page 9: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/9.jpg)
8/27/2011
9
Problem:
For the given frequency distribution in the previous problem, compute for:
Q1
D3
P88
Computing for Q1Solution:
5.124
50)1(
4
kNClasses
54-50
49-45
44 40
f
4
7
12
cfi = 5
cfkN )4(
N = 50
44-40
39-35
34-30
29-25
24-20
12
10
9
6
2
17
8
2
if
cfLQk
)4(
1Q )5(9
)85.12(30
5.321 Q
Computing for D3
1510
50)3(
10
kNClasses
54-50
49-45
44 40
f
4
7
12
cfi = 5
cfkN )(
N = 50
44-40
39-35
34-30
29-25
24-20
12
10
9
6
2
17
8
2
if
cfkNLDk
)10(
3D )5(9
)815(30
89.333 D
Computing for P88
44100
50)88(
100
kNClasses
54-50
49-45
44 40
f
4
7
12
cfi = 5
cfkN )(39
46
N = 50
44-40
39-35
34-30
29-25
24-20
12
10
9
6
2
17
8
2
if
cfkNLPk
)100(
88P )5(7
)3944(45
57.4888 P
27
39
![Page 10: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/10.jpg)
8/27/2011
10
For Grouped Data ( > 30 values)
MEASURES OF VARIATION
RANGERANGEThe range is computed as the
difference between the upper limit of the highest class interval and the
lower limit of the lowest class interval.
VARIANCE
N
xxf m
22 )(
STANDARD DEVIATION
N
xxf m
2)(
MEAN DEVIATION
N
xxfD
m
N
QUARTILE DEVIATION
213 QQ
Q
Problem:
For the given frequency distribution, determine:
variance standard deviation
mean deviationquartile deviation
![Page 11: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/11.jpg)
8/27/2011
11
Classes f
89-85 1
84-80 1
79-75 2
Classes f
59-55 7
54-50 6
49-45 6
74-70 3
69-65 4
64-60 4
44-40 6
39-35 3
34-30 1
fXm
87
82
154
216
268
248
Computing for the MeanSolution:
Classes f
89-85 1
84-80 1
79-75 2
74-70 3
69-65 4
64 60 4
Xm
87
82
77
72
67
62
N
fXx m )(
44
2443x
248
399
312
282
252
111
32
64-60 4
59-55 7
54-50 6
49-45 6
44-40 6
39-35 3
34-30 9
N = 44
62
57
52
47
42
37
32
2443 mfX
44
5.55x
f(xm - X )2
992.25
702.25
924.50
816.75
529.00
169 00
(xm - X )2
992.25
702.25
462.25
272.25
132.25
42 25
Computing for the VarianceClasses f
89-85 1
84-80 1
79-75 2
74-70 3
69-65 4
64-60 4
xm – X
31.5
26.5
21.5
16.5
11.5
6 573292
N
xxf m
22 )(
169.00
15.75
73.50
433.50
1093.50
1026.75
552.25
42.25
2.25
12.25
72.25
182.25
342.25
552.25
64-60 4
59-55 7
54-50 6
49-45 6
44-40 6
39-35 3
34-30 1
N = 44
6.5
1.5
-3.5
-8.5
-13.5
-18.5
-23.5
7329)( 2xxf m
44
57.1662
Computing for the Standard Deviation
57.1662 2
Since
57.166
906.12
![Page 12: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/12.jpg)
8/27/2011
12
Computing for the Mean DeviationClasses f
89-85 1
84-80 1
79-75 2
74-70 3
69-65 4
64 60 4
/xm –X /
31.5
26.5
21.5
16.5
11.5
6 5 465
N
xxfD
m
f /xm - X /
31.5
26.5
43.0
49.5
46.0
26 064-60 4
59-55 7
54-50 6
49-45 6
44-40 6
39-35 3
34-30 1
N = 44
6.5
1.5
3.5
8.5
13.5
18.5
23.5
465xxf m
44
465D
6.10D
26.0
10.5
21.0
51.0
81.0
55.5
23.5
Computing for the Quartile DeviationClasses f
89-85 1
84-80 1
79-75 2
74-70 3
69-65 4
i
f
cfLQ
kN
k
4
114
)44(1
4
kN
cf
44
43
42
40
37
83.45
6
51011451
Q
64-60 4
59-55 7
54-50 6
49-45 6
44-40 6
39-35 3
34-30 1
N = 44
334
)44(3
4
kN
60
4
53333603
Q
213 QQ
Q
33
29
22
16
10
4
1 2
83.4560
085.7Q
Types
of
Graphs
BAR GRAPH
The bar graph is particularly useful inpresenting data gathered from discretevariables on a nominal scale It uses rectanglesvariables on a nominal scale. It uses rectanglesor bars to represent discrete classes of data.The base of each bar corresponds to a classinterval of the frequency distribution and theheights of the bars represent the frequenciesassociated with each class.
![Page 13: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/13.jpg)
8/27/2011
13
HISTOGRAM
The histogram is similar to a barchart but the bases of each barchart but the bases of each barare the class boundaries ratherthan class limits.
FREQUENCY POLYGON
A frequency polygon is a lineq y p yggraph of class frequencies plottedagainst class marks.
Problem:
For the following frequency distribution,
Classes f
54-50 4
49-45 7
construct:bar graphhistogramfrequency polygon
44-40 12
39-35 10
34-30 9
29-25 6
24-20 2
BAR GRAPH
5
10
15
qu
en
cy
0
5
20-24 25-29 30-34 35-39 40-44 45-49 50-54
Class Marks
Fre
q
![Page 14: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/14.jpg)
8/27/2011
14
HISTOGRAM
6
9 1012
710
15
en
cy
2
67
4
0
5
Class Boundaries
Fre
qu
e
FREQUENCY POLYGON
5
10
15
qu
en
cy
0
5
20-24 25-29 30-34 35-39 40-44 45-49 50-54
Classes
Fre
q
PIE CHART
A pie chart is used to represent quantities that make up a whole.
The following table classifies enrolment in acertain university. Construct a pie chart toshow the enrolment distribution.
Problem:
Engineering 5280Commerce 3000Education 1800Arts & Sciences 1320Law 600
Engineering
Commerce
Education
Arts & Sciences
Law
![Page 15: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/15.jpg)
8/27/2011
15
CUMULATIVE FREQUENCY CURVE(Ogive Curve)
An ogive curve is a line graph obtainedby plotting values from the tabular
t b l i t l harrangement by class intervals whosefrequencies are cumulated. From thiscurve, the centile rank of a certain scorecan be determined. A centile rankdenotes the percentage of scores thatfall below a specified score in adistribution.
Construct the ogive curve for thegiven frequency distribution. What
Problem:
score correspond to C50? C88? Whatis the centile rank of a score of 50?
Classes f
64-60 2
59-55 12
54-50 20
49-45 32
44-40 46
39-35 58
cf
376
374
362
342
310
264
CP (cf/N x 100)
100.0
99.5
96.3
91.0
82.4
70.2
376N
34-30 64
29-25 58
24-20 42
19-15 23
14-10 15
9-5 4
206
142
84
42
19
4
54.8
37.8
22.3
11.2
5.0
1.1
20
40
60
80
100
120
CP
Ogive
0
0 9 14 19 24 29 34 39 44 49 54 59 64
UL
C50 C88 Score 50
Curve
= 33 = 48 = C91
![Page 16: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/16.jpg)
8/27/2011
16
Kurtosis and Skewness
The measures of skewness and kurtosis indicate the extent ofkurtosis indicate the extent of
departure of a distribution from normal and permit comparison of two or more distributions.
KURTOSIS (ku)Kurtosis refers to the flatness or
peakedness of a frequency distribution. It shows the shape of the curve or the
arrangement of a set of distribution inarrangement of a set of distribution in relation to the other set of distribution. The
coefficient of kurtosis is given by:
1090 PP
Qku
Types of Kurtosis
mesokurtic (ku = 0.263)
leptokurtic (ku < 0.263)
platykurtic (ku > 0.263)
Skewness refers to the symmetry orasymmetry of a frequency distribution. Thecoefficient of skewness is given by:
SKEWNESS (sk)
coefficient of skewness is given by:
s
mdxsk
)(3
![Page 17: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/17.jpg)
8/27/2011
17
If sk = 0, the distribution is normal.
MoMdX
If sk < 0, the distribution is negatively skewed.
Md MoX
)( XMdMo
If sk > 0, the distribution is positively skewed.
)( MoMdX
Md XMo
Problem:
For a certain frequency distribution, the ff. data are given:
7.13s 147md8.1553 Q
147x 1381 Q
8.128
5.167
8.155
1
90
3
D
P
Q
Determine the kurtosis and skewness of the distribution. Is it a normal distribution?
![Page 18: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/18.jpg)
8/27/2011
18
Solution:
1090 PP
Qku
190
13
2DP
Distribution is leptokurtic.
1090
23.08.1285.167
21388.155
ku
s
mdxsk
)(3
Distribution is negatively skewed.
05.07.13
)25.147147(3
sk
Student ActivityPart I. Answer the following:
1. Define each of the following:a. class mark c. histograma c ass a c stog ab. ogive d. frequency polygon
2. What advantages does each of the following forms of presenting data offer?a. textual b. tabular c. graphical
![Page 19: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/19.jpg)
8/27/2011
19
3. Distinguish between:a. class limits and class boundariesb. skewness and kurtosis
4. Give the class mark, the class boundaries and the interval size for each of the following:a. 10 – 19b. 1.5 – 5.0c. 12.85 – 13.43
Part II. Solve the following using Microsoft Excel Applications.
The list below gives the weekly food budget and weekly incomes for 39 households.
1 Construct frequency distribution table for1. Construct frequency distribution table for food budget using i = 25 and determine:
a. mean
b. median
c. rough and theoretical mode
d skewness
Food Budget W eekly Incom e Food Budget W eekly Incom e 1598 1553 1639 1636 1680 1740 1655 1677 1660 1652 1736 1761 1583 1581 1587 1603 1476 1481 1622 1605 1633 1634 1689 1631 1717 1692 1700 1765 1596 1561 1613 1688 1613 1566 1615 1667
1607 1626 1458 1479 1607 1626 1458 1479 1728 1699 1750 1747 1672 1685 1700 1673 1572 1589 1654 1641 1634 1571 1625 1613 1461 1443 1565 1521 1726 1712 1563 1583 1732 1724 1566 1542 1620 1628 1587 1567 1616 1564 1584 1610 1579 1526
2. Construct frequency distribution table for weekly income using i = 25 and determine:a) standard deviationb) mean deviationc) quartile deviationd) kurtosis
3. Plot a bar chart for food budget and superimpose on it the frequency polygon for weekly income.
![Page 20: 3-Statistical Description of Data [Compatibility Mode]](https://reader031.fdocuments.us/reader031/viewer/2022013114/54e873634a7959b17a8b46fc/html5/thumbnails/20.jpg)
8/27/2011
20
4. Take the difference between weekly income and food budget for each household and construct a frequency di t ib ti d l ti fdistribution and cumulative frequency distribution.
5. Plot the ogive curve for the data in (4). What score corresponds to a centile rank of 71?
Proceed to Topic 4Proceed to Topic 4