3-Statistical Description of Data [Compatibility Mode]

20
8/27/2011 1 Chapter 3. Frequency Distributions STATISTICAL DESCRIPTION OF DATA Grouped Data Percentiles, Deciles & Quartiles Graphical Representations Symmetry and Skewness Objectives Set up a frequency distribution for a mass of data. Calculate the mean median and mode Calculate the mean, median and mode for grouped data. Calculate and interpret other measures of location like the deciles, quartiles & percentiles. Calculate the standard deviation, variance, mean deviation and quartile deviation for ddt grouped data. Construct histograms, bar charts, frequency polygons, pie charts and ogives. Describe a given set of data in terms of skewness and kurtosis. Statistical data collected should be arranged in such a manner that will allow a reader to distinguish their essential features. Depending on the type and the objectives of the person presenting the information, data may be presented using one or a combination of three forms.

Transcript of 3-Statistical Description of Data [Compatibility Mode]

Page 1: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

1

Chapter 3.

Frequency Distributions

STATISTICAL DESCRIPTIONOF DATA

Grouped Data

Percentiles, Deciles & Quartiles

Graphical Representations

Symmetry and Skewness

Objectives

Set up a frequency distribution for a mass of data.

Calculate the mean median and modeCalculate the mean, median and mode for grouped data.

Calculate and interpret other measures of location like the deciles, quartiles & percentiles.

Calculate the standard deviation, variance, mean deviation and quartile deviation for

d d tgrouped data.

Construct histograms, bar charts, frequency polygons, pie charts and ogives.

Describe a given set of data in terms of skewness and kurtosis.

Statistical data collected should be arranged in such a manner that will allow a reader to distinguish their

essential features. Depending on the type and the objectives of the person presenting the information, data may

be presented using one or a combination of three forms.

Page 2: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

2

Three Forms of Presenting Data

Textual Form – data ispresented in paragraphfform especially whenthey are purelyqualitative or when veryfew numbers areinvolved.

Tabular Form - data is presented in rows and columns

Graphical Form - data is presented in visual form

0

500

1000

1500

2000

2500

3000

3500

4000

Janu

ary

Febru

ary

Mar

chApr

ilM

ayJu

ne July

Augus

t

Septe

mbe

r

Octob

er

Novem

ber

Decem

ber

1991

1992

1993

1994

1995

When that data include a large number of observations, it is

convenient to group the values intoconvenient to group the values into mutually exclusive classes and show the number of observations occurring

in each class in a tabular form.

Frequency Distribution

A frequency distribution is thearrangement of data that shows thefrequency of occurrence of valuesfalling within arbitrarily defined rangesof the variable known as classintervals. The smallest and largestvalues that fall in a given interval arecalled class limits.

Page 3: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

3

Class Frequency and Class Mark

Class frequency refers to the number of observations falling in anumber of observations falling in a particular class while the midpoint

between the upper and lower class limits is called class mark/midpoint.

Steps in Making a Frequency Distribution

Find the range.

Determine the interval size by dividing the range by the desired number of classes which is normally not less than 10 and not

hmore than 20.

Determine the class limits of the class intervals. Tabulation is facilitated if the lower class limits of the class intervals are multiples of the class size. The bottom interval must include the lowest score.

List the intervals, beginning at the bottom.

Tally the frequencies Tally the frequencies.

Summarize these under a column labeled f.

Total this column and record the number at the bottom.

Problem:

Construct a frequency distribution of the given scores on a test.

56 28 42 56 47 39 62 60 54 4756 28 42 56 47 39 62 60 54 47

78 82 55 56 41 44 54 42 62 48

62 38 57 55 50 47 42 56 68 53

37 72 65 66 52 52 48 48 42 68

Page 4: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

4

Solution:

Computing for the range:

R = 82 – 28 = 54

C ti f th l i t lComputing for the class interval:

Therefore, class interval may be 5 or 6.

4.510

54i

We choose 5 because it is the odd number.

If i = 5, lowest limit should be

We choose 25 because it is the smallest multiple of the chosen interval which is smaller than the smallest value in the set

25.

smaller than the smallest value in the set.

If lowest limit is 25, the bottom interval should be 29 – 25.

The interval 29 - 25 contains the lowest score (28).

Classes

59 - 55

64 - 60

69 - 65

74 - 70

79 - 7584 - 80

Tally f

1

1

1

4

4

7

/

/

/

////

////

///////

29 - 25

34 - 30

39 - 35

44 - 40

49 - 45

54 - 50

40fN

6

6

6

3

0

1

//////

//////

//////

///

/

For Grouped Data ( > 30 values)

MEASURES OF CENTRAL TENDENCY

MEANMethods :1. Midpoint Method2. Short Method

MEAN

Page 5: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

5

Midpoint Method

After the f column, make another column and enter the midpoint (Xm) of each class. Multiply the frequency with the midpoint and enter it in the

next column Label the column f X Get the sumnext column. Label the column f Xm. Get the sum. Use the formula:

N

fXx m )(

Short Method

Choose a class at or near the middle of thedistribution to be designated as the origin. Afterthe f column, construct the deviation column (d).Mark the chosen class zero. In succession, write-1, -2 and so on for classes lower in value thanthe origin. In like manner, write 1, 2, 3 and so onfor classes greater in value than the origin.Construct f x d column and get the algebraicsum.

Use the formula:

fxd )(i

N

fxdzx a lg)(

where z = midpt. of class chosen as origin

Problem:

For the given frequency distribution

Classes f

54-50 4

49-45 7frequency distribution, compute for the mean

using:

Midpoint MethodShort Method

49 45 7

44-40 12

39-35 10

34-30 9

29-25 6

24-20 2

Page 6: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

6

Solution:

N

fXx m )(

1905

Classes

54-50

49-45

44 40

f

4

7

12

Xm

52

47

42

fXm

208

329

504

Using Midpoint Method

N = 50 1905mfX

50

1905x

1.38x

44-40

39-35

34-30

29-25

24-20

12

10

9

6

2

42

37

32

27

22

504

307

288

162

44

Using Short Method

Classes

54-50

49-45

44 40

f

4

7

12

d

3

2

1

fd

12

14

12

iN

fxdzx a lg)(

)5(11

37

N = 50 11fd

44-40

39-35

34-30

29-25

24-20

12

10

9

6

2

1

0

-1

-2

-3

12

0

-9

-12

-6

)5(50

37 x

1.38x

Find

MEDIAN

NSteps:

2

Find the accumulated sum of the frequencies up to the sum that contains

2

N

Use the formula:

if

cfNLMd

)2(

f

where L = lower limit of class which contains N/2f = frequency of class containing N/2

cf = cumulative sum that approaches or is equal to N/2

Page 7: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

7

MODE

Rough Mode( R. Mo) - obtained by inspection and is equal to the p qXm of class having the highest frequency.

xMd 23 Theoretical Mode( T. Mo)

Problem:

For the given frequency distribution in the previous problem, compute for the:problem, compute for the:

MedianR. ModeT. Mode

Computing for the MedianSolution:

252

50

2

NClasses

54-50

49-45

44 40

f

4

7

12

cf i = 5

cfN )(

N = 50

44-40

39-35

34-30

29-25

24-20

12

10

9

6

2

27

17

8

2

if

cfNLMd

)2(

Md )5(10

)1725(35

39Md

Computing for the Mode

Classes

54-50

49-45

44 40

f

4

7

12

R. Mode = 42

xMdModeT 23.

N = 50

44-40

39-35

34-30

29-25

24-20

12

10

9

6

2

since 1.38x39Md

)1.38(2)39(3. ModeT

8.40. ModeT

Page 8: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

8

Other Measures of Position

QuartilesDecilesPercentiles

Quartiles - those which divide the distribution into 4 parts

if

cfkNLQk

)4(

Deciles - those which divide the distribution into 10 parts

if

cfkNLDk

)10(

Percentiles - those which divide the distribution into 100 parts

if

cfkNLPk

)100(

Page 9: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

9

Problem:

For the given frequency distribution in the previous problem, compute for:

Q1

D3

P88

Computing for Q1Solution:

5.124

50)1(

4

kNClasses

54-50

49-45

44 40

f

4

7

12

cfi = 5

cfkN )4(

N = 50

44-40

39-35

34-30

29-25

24-20

12

10

9

6

2

17

8

2

if

cfLQk

)4(

1Q )5(9

)85.12(30

5.321 Q

Computing for D3

1510

50)3(

10

kNClasses

54-50

49-45

44 40

f

4

7

12

cfi = 5

cfkN )(

N = 50

44-40

39-35

34-30

29-25

24-20

12

10

9

6

2

17

8

2

if

cfkNLDk

)10(

3D )5(9

)815(30

89.333 D

Computing for P88

44100

50)88(

100

kNClasses

54-50

49-45

44 40

f

4

7

12

cfi = 5

cfkN )(39

46

N = 50

44-40

39-35

34-30

29-25

24-20

12

10

9

6

2

17

8

2

if

cfkNLPk

)100(

88P )5(7

)3944(45

57.4888 P

27

39

Page 10: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

10

For Grouped Data ( > 30 values)

MEASURES OF VARIATION

RANGERANGEThe range is computed as the

difference between the upper limit of the highest class interval and the

lower limit of the lowest class interval.

VARIANCE

N

xxf m

22 )(

STANDARD DEVIATION

N

xxf m

2)(

MEAN DEVIATION

N

xxfD

m

N

QUARTILE DEVIATION

213 QQ

Q

Problem:

For the given frequency distribution, determine:

variance standard deviation

mean deviationquartile deviation

Page 11: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

11

Classes f

89-85 1

84-80 1

79-75 2

Classes f

59-55 7

54-50 6

49-45 6

74-70 3

69-65 4

64-60 4

44-40 6

39-35 3

34-30 1

fXm

87

82

154

216

268

248

Computing for the MeanSolution:

Classes f

89-85 1

84-80 1

79-75 2

74-70 3

69-65 4

64 60 4

Xm

87

82

77

72

67

62

N

fXx m )(

44

2443x

248

399

312

282

252

111

32

64-60 4

59-55 7

54-50 6

49-45 6

44-40 6

39-35 3

34-30 9

N = 44

62

57

52

47

42

37

32

2443 mfX

44

5.55x

f(xm - X )2

992.25

702.25

924.50

816.75

529.00

169 00

(xm - X )2

992.25

702.25

462.25

272.25

132.25

42 25

Computing for the VarianceClasses f

89-85 1

84-80 1

79-75 2

74-70 3

69-65 4

64-60 4

xm – X

31.5

26.5

21.5

16.5

11.5

6 573292

N

xxf m

22 )(

169.00

15.75

73.50

433.50

1093.50

1026.75

552.25

42.25

2.25

12.25

72.25

182.25

342.25

552.25

64-60 4

59-55 7

54-50 6

49-45 6

44-40 6

39-35 3

34-30 1

N = 44

6.5

1.5

-3.5

-8.5

-13.5

-18.5

-23.5

7329)( 2xxf m

44

57.1662

Computing for the Standard Deviation

57.1662 2

Since

57.166

906.12

Page 12: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

12

Computing for the Mean DeviationClasses f

89-85 1

84-80 1

79-75 2

74-70 3

69-65 4

64 60 4

/xm –X /

31.5

26.5

21.5

16.5

11.5

6 5 465

N

xxfD

m

f /xm - X /

31.5

26.5

43.0

49.5

46.0

26 064-60 4

59-55 7

54-50 6

49-45 6

44-40 6

39-35 3

34-30 1

N = 44

6.5

1.5

3.5

8.5

13.5

18.5

23.5

465xxf m

44

465D

6.10D

26.0

10.5

21.0

51.0

81.0

55.5

23.5

Computing for the Quartile DeviationClasses f

89-85 1

84-80 1

79-75 2

74-70 3

69-65 4

i

f

cfLQ

kN

k

4

114

)44(1

4

kN

cf

44

43

42

40

37

83.45

6

51011451

Q

64-60 4

59-55 7

54-50 6

49-45 6

44-40 6

39-35 3

34-30 1

N = 44

334

)44(3

4

kN

60

4

53333603

Q

213 QQ

Q

33

29

22

16

10

4

1 2

83.4560

085.7Q

Types

of

Graphs

BAR GRAPH

The bar graph is particularly useful inpresenting data gathered from discretevariables on a nominal scale It uses rectanglesvariables on a nominal scale. It uses rectanglesor bars to represent discrete classes of data.The base of each bar corresponds to a classinterval of the frequency distribution and theheights of the bars represent the frequenciesassociated with each class.

Page 13: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

13

HISTOGRAM

The histogram is similar to a barchart but the bases of each barchart but the bases of each barare the class boundaries ratherthan class limits.

FREQUENCY POLYGON

A frequency polygon is a lineq y p yggraph of class frequencies plottedagainst class marks.

Problem:

For the following frequency distribution,

Classes f

54-50 4

49-45 7

construct:bar graphhistogramfrequency polygon

44-40 12

39-35 10

34-30 9

29-25 6

24-20 2

BAR GRAPH

5

10

15

qu

en

cy

0

5

20-24 25-29 30-34 35-39 40-44 45-49 50-54

Class Marks

Fre

q

Page 14: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

14

HISTOGRAM

6

9 1012

710

15

en

cy

2

67

4

0

5

Class Boundaries

Fre

qu

e

FREQUENCY POLYGON

5

10

15

qu

en

cy

0

5

20-24 25-29 30-34 35-39 40-44 45-49 50-54

Classes

Fre

q

PIE CHART

A pie chart is used to represent quantities that make up a whole.

The following table classifies enrolment in acertain university. Construct a pie chart toshow the enrolment distribution.

Problem:

Engineering 5280Commerce 3000Education 1800Arts & Sciences 1320Law 600

Engineering

Commerce

Education

Arts & Sciences

Law

Page 15: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

15

CUMULATIVE FREQUENCY CURVE(Ogive Curve)

An ogive curve is a line graph obtainedby plotting values from the tabular

t b l i t l harrangement by class intervals whosefrequencies are cumulated. From thiscurve, the centile rank of a certain scorecan be determined. A centile rankdenotes the percentage of scores thatfall below a specified score in adistribution.

Construct the ogive curve for thegiven frequency distribution. What

Problem:

score correspond to C50? C88? Whatis the centile rank of a score of 50?

Classes f

64-60 2

59-55 12

54-50 20

49-45 32

44-40 46

39-35 58

cf

376

374

362

342

310

264

CP (cf/N x 100)

100.0

99.5

96.3

91.0

82.4

70.2

376N

34-30 64

29-25 58

24-20 42

19-15 23

14-10 15

9-5 4

206

142

84

42

19

4

54.8

37.8

22.3

11.2

5.0

1.1

20

40

60

80

100

120

CP

Ogive

0

0 9 14 19 24 29 34 39 44 49 54 59 64

UL

C50 C88 Score 50

Curve

= 33 = 48 = C91

Page 16: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

16

Kurtosis and Skewness

The measures of skewness and kurtosis indicate the extent ofkurtosis indicate the extent of

departure of a distribution from normal and permit comparison of two or more distributions.

KURTOSIS (ku)Kurtosis refers to the flatness or

peakedness of a frequency distribution. It shows the shape of the curve or the

arrangement of a set of distribution inarrangement of a set of distribution in relation to the other set of distribution. The

coefficient of kurtosis is given by:

1090 PP

Qku

Types of Kurtosis

mesokurtic (ku = 0.263)

leptokurtic (ku < 0.263)

platykurtic (ku > 0.263)

Skewness refers to the symmetry orasymmetry of a frequency distribution. Thecoefficient of skewness is given by:

SKEWNESS (sk)

coefficient of skewness is given by:

s

mdxsk

)(3

Page 17: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

17

If sk = 0, the distribution is normal.

MoMdX

If sk < 0, the distribution is negatively skewed.

Md MoX

)( XMdMo

If sk > 0, the distribution is positively skewed.

)( MoMdX

Md XMo

Problem:

For a certain frequency distribution, the ff. data are given:

7.13s 147md8.1553 Q

147x 1381 Q

8.128

5.167

8.155

1

90

3

D

P

Q

Determine the kurtosis and skewness of the distribution. Is it a normal distribution?

Page 18: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

18

Solution:

1090 PP

Qku

190

13

2DP

QQ

Distribution is leptokurtic.

1090

23.08.1285.167

21388.155

ku

s

mdxsk

)(3

Distribution is negatively skewed.

05.07.13

)25.147147(3

sk

Student ActivityPart I. Answer the following:

1. Define each of the following:a. class mark c. histograma c ass a c stog ab. ogive d. frequency polygon

2. What advantages does each of the following forms of presenting data offer?a. textual b. tabular c. graphical

Page 19: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

19

3. Distinguish between:a. class limits and class boundariesb. skewness and kurtosis

4. Give the class mark, the class boundaries and the interval size for each of the following:a. 10 – 19b. 1.5 – 5.0c. 12.85 – 13.43

Part II. Solve the following using Microsoft Excel Applications.

The list below gives the weekly food budget and weekly incomes for 39 households.

1 Construct frequency distribution table for1. Construct frequency distribution table for food budget using i = 25 and determine:

a. mean

b. median

c. rough and theoretical mode

d skewness

Food Budget W eekly Incom e Food Budget W eekly Incom e 1598 1553 1639 1636 1680 1740 1655 1677 1660 1652 1736 1761 1583 1581 1587 1603 1476 1481 1622 1605 1633 1634 1689 1631 1717 1692 1700 1765 1596 1561 1613 1688 1613 1566 1615 1667

1607 1626 1458 1479 1607 1626 1458 1479 1728 1699 1750 1747 1672 1685 1700 1673 1572 1589 1654 1641 1634 1571 1625 1613 1461 1443 1565 1521 1726 1712 1563 1583 1732 1724 1566 1542 1620 1628 1587 1567 1616 1564 1584 1610 1579 1526

2. Construct frequency distribution table for weekly income using i = 25 and determine:a) standard deviationb) mean deviationc) quartile deviationd) kurtosis

3. Plot a bar chart for food budget and superimpose on it the frequency polygon for weekly income.

Page 20: 3-Statistical Description of Data [Compatibility Mode]

8/27/2011

20

4. Take the difference between weekly income and food budget for each household and construct a frequency di t ib ti d l ti fdistribution and cumulative frequency distribution.

5. Plot the ogive curve for the data in (4). What score corresponds to a centile rank of 71?

Proceed to Topic 4Proceed to Topic 4