Lesson 7 measures of dispersion part 1

33
Introduction to Statistics for Built Environment Course Code: AED 1222 Compiled by DEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED) CENTRE FOR FOUNDATION STUDIES (CFS) INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

Transcript of Lesson 7 measures of dispersion part 1

Page 1: Lesson 7 measures of dispersion part 1

Introduction to Statistics for Built Environment

Course Code: AED 1222

Compiled byDEPARTMENT OF ARCHITECTURE AND ENVIRONMENTAL DESIGN (AED)

CENTRE FOR FOUNDATION STUDIES (CFS)INTERNATIONAL ISLAMIC UNIVERSITY MALAYSIA

Page 2: Lesson 7 measures of dispersion part 1

Lecture 8

Today’s Lecture: The range Quartiles & the Interquartile range. Percentiles Percentiles Rank The five number summary

Measures of variability/dispersionPart I

Page 3: Lesson 7 measures of dispersion part 1

What is/are Measures of Variation/Dispersion?

●If the dispersion are widely dispersed, the central location is said to be less representative of the data as a whole. ●If the dispersion are closely dispersed, the central location is considered more reliable.

Measures of Variation/Dispersion

●Measures of Variation / Dispersion is an information on the spread or variability of the data values.

Page 4: Lesson 7 measures of dispersion part 1

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• Describes how the data is spreading or the variability

Same center, different variation / dispersion

Shape of a Distribution and Variability / Dispersion

Measures of variation/dispersion cont.

Close dispersion

Wide dispersion

Page 5: Lesson 7 measures of dispersion part 1

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Right-SkewedLeft-Skewed Symmetric

Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3

Shape of a Distribution and Variability / Dispersion

• Shapes either Symmetric or skewed

Measures of variation/dispersion cont.

Page 6: Lesson 7 measures of dispersion part 1

The range

●The range is simply the difference between the largest and the smallest observed values in a data set. Thus, range, including any outliers, is the actual spread of data.

●A great deal of information is ignored when computing the range, since only the largest and smallest data values are considered.

Range = difference between highest and lowest observed values

What is/are Range?

Page 7: Lesson 7 measures of dispersion part 1

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• Simplest measure of variation• Difference between the largest and the smallest

observations:

Range = xmaximum – xminimum

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14

Range = 14 - 1 = 13

Example:

The range cont.

Find The Range

Page 8: Lesson 7 measures of dispersion part 1

●The range value of a data set is greatly influenced by the presence of just one unusually large or small value (outlier).

●The range can be expressed as an interval such as 4–10, where 4 is the lowest value and 10 is highest.

●Often, it is expressed as interval width. For example, the range of 4–10 can also be expressed as a range of 6.

The range cont.

Page 9: Lesson 7 measures of dispersion part 1

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• Ignores the way in which data are distributed

• Sensitive to outliers

7 8 9 10 11 12 Range = 12 - 7 = 5

7 8 9 10 11 12 Range = 12 - 7 = 5

Disadvantages of the Range

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5

1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120

Range = 5 - 1 = 4

Range = 120 - 1 = 119

The range cont.

Page 10: Lesson 7 measures of dispersion part 1

●Other disadvantages of using range is that it does not measure the spread of the majority of values in a data set—it only measures the spread between highest and lowest values.

●As a result, other measures are required in order to give a better picture of the data spread.

●The range is an informative tool used as a supplement to other measures such as the standard deviation or semi-interquartile range, but it should rarely be used as the only measure of spread.

The range cont.

Page 11: Lesson 7 measures of dispersion part 1

1, 2, 4, 6, 12, 15, 19, 26

Smallest Value Largest Value

Does not take into account how clumped together the scores are

Range = 26 - 1 = 25Question: Is range a good measure of spread/dispersion?

Question: Is value 25 a good representative value?

The range cont.

Example 1:

Page 12: Lesson 7 measures of dispersion part 1

425425 430430 430430 435435 435435 435435 435435 435435 440440 440440440440 440440 440440 445445 445445 445445 445445 445445 450450 450450

450450 450450 450450 450450 450450 460460 460460 460460 465465 465465465465 470470 470470 472472 475475 475475 475475 480480 480480 480480

480480 485485 490490 490490 490490 500500 500500 500500 500500 510510515515 525525 525525 525525 535535 549549 550550 570570 570570

575575 575575 580580 590590 600600 600600 600600 600600 615615 615615510510

Smallest value

Largest value

does not take into account how

clumped together the scores are

Range = 615 – 425 = 190Question: Is range a good measure of spread/dispersion?

The range cont.

Example 2:

Page 13: Lesson 7 measures of dispersion part 1

The quartiles

●In descriptive statistics, a quartile is any of the three values which divide the sorted (arrayed) data set into four equal parts, so that each part represents one fourth of the sampled population.

●The median divides the data into two equal sets.

●The lower quartile is the value of the middle of the first set, where 25% of the values are smaller than Q1 and 75% are larger. This first quartile takes the notation Q1.

●The upper quartile is the value of the middle of the second set, where 75% of the values are smaller than Q3 and 25% are larger. This third quartile takes the notation Q3.

What is/are Quartiles?

Page 14: Lesson 7 measures of dispersion part 1

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• Quartiles split the ranked data into 4 equal groups:

25% 25% 25% 25%

Q1 Q2 Q3

The quartiles cont.

> first quartile (designated Q1) = lower quartile = cuts off lowest 25% of data = 25th percentile

> second quartile (designated Q2) = median = cuts off data set in half (50% of data) = 50th percentile

> third quartile (designated Q3) = upper quartile = cuts off highest 25% of data, or lowest 75% = 75th percentile

• Note that the second quartile Q2 (the 50th percentile) is the median

Page 15: Lesson 7 measures of dispersion part 1

The formula for locating the position of the observation at a given percentile, y, with n data points sorted in ascending order is:

i = Ly = (y/100)n•Case 1: If L is a whole number, then the value will be found halfway between positions L and L+1.•Case 2: If L is a decimal, round up to the nearest whole number. (for example, L = 1.2 becomes 2).

Locating the position of the quartiles

The quartiles cont.

Page 16: Lesson 7 measures of dispersion part 1

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Sample Data in Ordered Array:

11 12 13 16 16 17 18 21 22

Example: Find the first quartile (Q1)

n = 9

Q1 = 25th percentile, so find i : i = (9) = 2.25

so round up and use the value in the 3rd position: Q1 = 13

25 100

The quartiles cont.

Page 17: Lesson 7 measures of dispersion part 1

The Interquartile Range

●The interquartile range is another range used as a measure of the spread.

●The difference between upper and lower quartiles (Q3–Q1), which is called the interquartile range, also indicates the dispersion of a data set.

●The interquartile range spans 50% of a data set, and eliminates the influence of outliers because, in effect, the highest and lowest quarters are removed.

What is/are Interquartile Range?

Page 18: Lesson 7 measures of dispersion part 1

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• Can eliminate some outlier problems by using the interquartile range

• Eliminate some high-and low-valued observations and calculate the range from the remaining values.

• Interquartile range = 3rd quartile – 1st quartile

The interquartile range cont.

Interquartile range = difference between upper quartile (Q3) and lower quartile (Q1)

Page 19: Lesson 7 measures of dispersion part 1

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Median(Q2) X maximumX minimum Q1 Q3

Example:

25% 25% 25% 25%

12 30 45 57 70

Interquartile range = 57 – 30 = 27

The interquartile range cont.

Page 20: Lesson 7 measures of dispersion part 1

An exerciseA year ago, Ali began working at a computer store. His supervisor asked him to keep a record of the number of sales he made each month.

The following data set is a list of his sales for the last 12 months: 34, 47, 1, 15, 57, 24, 20, 11, 19, 50, 28, 37.

•Use Ali's sales records to find: the median the range the upper and lower quartiles the interquartile range

Page 21: Lesson 7 measures of dispersion part 1

The values in an ascending array are:1, 11, 15, 19, 20, 24, 28, 34, 37, 47, 50, 57.

Median = (12 + 1) ÷ 2= 6.5th value= (6th + 7th observations) ÷ 2= (24 + 28) ÷ 2= 26

Range = difference between the highest and lowest values = 57 – 1

= 56

Exercise cont.

Page 22: Lesson 7 measures of dispersion part 1

Lower quartile = value of middle of first half of data Q1 = the median of 1, 11, 15, 19, 20, 24= (3rd + 4th observations) ÷ 2= (15 + 19) ÷ 2= 17

Upper quartile = value of middle of second half of data Q3= the median of 28, 34, 37, 47, 50, 57= (3rd + 4th observations) ÷ 2= (37 + 47) ÷ 2= 42

Interquartile range = Q3–Q1 = 42 – 17 = 25

Exercise cont.

Page 23: Lesson 7 measures of dispersion part 1

Exercise cont.

Page 24: Lesson 7 measures of dispersion part 1

Percentiles

●The ath percentile is a value so that roughly a% of the data are smaller and (100-a)% of the data are larger.

●There are three steps for computing a percentile. – Organize the data into an ascending array;– Count the number of values (n);– Select the a*(n+1) observation.

What is/are Percentile?

Page 25: Lesson 7 measures of dispersion part 1

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

• The pth percentile in an ordered array of n values is the value in ith position, where

Example: Find the 60th percentile in an ordered array of19 values.

(n)100

pi

11.4(19)100

60(n)

100

pi

If i is not an integer, round up to the next higher integer value

So use value in the i = 12th position

Percentiles cont.

Page 26: Lesson 7 measures of dispersion part 1

You can't always be so lucky to have a*(n+1) be a nice whole number. Here are some scenarios:

•If a*(n+1) is not a whole number, then go halfway between the two adjacent numbers. •If a*(n+1) < 1, select the smallest observation. •If a*(n+1) > n, select the largest observation.

Percentiles cont.

Page 27: Lesson 7 measures of dispersion part 1

• Arrayed data: 18, 33, 58, 67, 73, 93, 147 • There are 7 observations (n=7).• Select 0.50*(7+1) = 4th observation.• Therefore, the 50th percentile equals 67.

-Notice that there are three observations larger than 67 and three observations smaller than 67.

Percentiles cont.Example: Compute the 50th percentile for the following data set

73, 58, 67, 93, 33, 18, 147 50th percentile

Page 28: Lesson 7 measures of dispersion part 1

Suppose we want to compute the 20th percentile…• Notice that p*(n+1) = 0.20*(7+1)=1.6. This is not a

whole number so we select halfway between 1st and 2nd observation or 25.5.

Suppose we want to compute the 10th percentile…• Since 0.10*(7+1)=0.8, we should select the smallest

observation which is 18.

Percentile cont.

18, 33, 58, 67, 73, 93, 14710th percentile

20th percentile25.5Example: (Cont.)

Page 29: Lesson 7 measures of dispersion part 1

• Percentile rank of a score is the percentage of scores in its frequency distribution which are lower than it.

• Percentile ranks are commonly used to clarify the interpretation of scores on standardized tests.

• Given formula:Pr = (number of value below the score+ 0.5) ÷ ( total number of values) (100%).

Percentiles Rank

What is/are Percentile Rank?

Page 30: Lesson 7 measures of dispersion part 1

Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc.

Sample Data in Ordered Array:

11 12 13 16 16 17 18 21 22

Example: Find the percentile rank for (17)

n = 9

Pr17 = [(no. of values below the score of 17+0.5)/no. of scores]*100

= [(5+0.5)/9]*100

= 61st percentile.

This value indicated that the student, who obtained a score of 17 marks, scored higher than 61% of the members of his/her class.

Percentile Rank cont.

61%

Page 31: Lesson 7 measures of dispersion part 1

The five number summary

A five number summary uses percentiles to describe a set of data. The five number summary consists of •MAX - the maximum value•75% - the 75th percentile•50% - the 50th percentile (or the median)•25% - the 25th percentile•MIN - the minimum value The five number summary splits the data into four regions, each of which contains 25% of the data.

What is/are Five Number Summary?

Page 32: Lesson 7 measures of dispersion part 1

• The minimum value is = 1 • The lower half is {1, 3, 4}, and the median of that half (the

25th percentile) is = 3• The median (the 50th percentile) is = 5• The upper half is {6, 7, 9}, and the median of that half (the

75th percentile) is = 7• The maximum value is = 9

• The minimum value is = 1 • The lower half is {1, 3, 4}, and the median of that half (the

25th percentile) is = 3• The median (the 50th percentile) is = 5• The upper half is {6, 7, 9}, and the median of that half (the

75th percentile) is = 7• The maximum value is = 9

The five number summary cont.

Example: Find the five number summary for the data set.

1, 3, 4, 5, 6, 7, 9

Page 33: Lesson 7 measures of dispersion part 1

Next class…

The following topics will be discussed: Measures of variability / dispersion (Part II):

The average absolute deviation The Variance The Standard deviation Coefficient of Variation (CV)