!NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the...

38
STATISTICAL DISTRIBUTIONS 109 4 4 Statistical distributions DATA ANALYSIS Sydneysiders often like to claim that it’s always raining in Melbourne, but is Melbourne really such a wet city? The following data shows the average number of rainy days per month for the two capital cities, and is supplied by the Bureau of Meteorology. Is it possible to use this data to compare the rainfalls of the two cities and decide which city is wetter? Does Melbourne generally have more rainy days than Sydney, or is the number of rainy days per month more consistent? This chapter is about comparing the statistics of data sets and noting any similarities and differences. Do men spend more money than women? Do more people shop on weekends than weekdays? Are teachers generally younger than doctors? Data sets can be compared by examining the shapes of their graphs or by analysing their calculated measures of location and spread. In this chapter you will learn how to: n name and use the different types of data and random samples n calculate measures of location (mean, median, mode) n calculate measures of spread (range, interquartile range, standard deviation) n analyse and interpret dot plots, stem-and-leaf-plots, box plots and radar charts n investigate outliers in small data sets and their effects on the mean, median and mode n describe the shape of a distribution and make conclusions about the data in the distribution n display and compare two sets of data in double stem-and-leaf plots, double box-and- whisker plots, radar charts and area charts n interpret data presented in a two-way table form n use summary statistics and multiple displays to interpret and compare the relationships between two data sets. Month J F M A M J J A S O N D Sydney 12 12 13 12 12 12 10 10 10 12 11 12 Melbourne 8 7 9 12 14 14 15 16 15 14 12 11 !NNC Yr12 maths ch 04 Page 109 Wednesday, October 4, 2000 1:43 AM

Transcript of !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the...

Page 1: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS

109

44

Statistical distributions

DATA ANALYSIS

Sydneysiders often like to claim that it’s always raining in Melbourne, but is Melbourne really such a wet city? The following data shows the average number of rainy days per month for the two capital cities, and is supplied by the Bureau of Meteorology.

Is it possible to use this data to compare the rainfalls of the two cities and decide which city is wetter? Does Melbourne generally have more rainy days than Sydney, or is the number of rainy days per month more consistent?

This chapter is about comparing the statistics of data sets and noting any similarities and differences. Do men spend more money than women? Do more people shop on weekends than weekdays? Are teachers generally younger than doctors? Data sets can be compared by examining the shapes of their graphs or by analysing their calculated measures of location and spread.

In this chapter you will learn how to:

n

name and use the different types of data and random samples

n

calculate measures of location (mean, median, mode)

n

calculate measures of spread (range, interquartile range, standard deviation)

n

analyse and interpret dot plots, stem-and-leaf-plots, box plots and radar charts

n

investigate outliers in small data sets and their effects on the mean, median and mode

n

describe the shape of a distribution and make conclusions about the data in the distribution

n

display and compare two sets of data in double stem-and-leaf plots, double box-and-whisker plots, radar charts and area charts

n

interpret data presented in a two-way table form

n

use summary statistics and multiple displays to interpret and compare the relationships between two data sets.

Month

J F M A M J J A S O N D

Sydney

12 12 13 12 12 12 10 10 10 12 11 12

Melbourne

8 7 9 12 14 14 15 16 15 14 12 11

!NNC Yr12 maths ch 04 Page 109 Wednesday, October 4, 2000 1:43 AM

Page 2: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

110

NEW CENTURY MATHS GENERAL: HSC

COLLECTING AND DISPLAYING DATA

Collecting data

Data or information can be collected by a variety of means:

n

through observation, such as a naturalist observing animal behaviour

n

by experiment—for example, a medical researcher testing the effects of a new drug

n

from a survey, usually via a telephone poll or a written questionnaire

n

by taking a census—that is, surveying the whole population.

Do you still have your statistics file containing graphs and tables collected during the Preliminary Course? You should now add to your file by collecting articles from recent newspapers that contain graphs and tables, especially those that contain more than one statistical display or display data in an interesting way.

Use your library or explore the Internet to find real data. Here are three useful websites:Australian Bureau of Statistics (ABS) www.abs.gov.au or www.statistics.gov.auBureau of Meterorology www.bom.gov.auMorgan Surveys www.roymorgan.com.au

Types of data

Data falls into one of the following types:

n

quantitative

(numerical) data that is

discrete

, such as the number of computers in schools

n

quantitative

(numerical) data that is

continuous

, such as the weights of gym members

n

categorical

(qualitative) data, such as the birthplaces of people living in Sydney.

Example 1

What type of data is each of these?(a) the numbers of people attending Olympic Games(b) the types of breakfast cereal in Cottonworths supermarket(c) the body temperature of a hospital patient taken over a 24-hour period

Solution

(a) Quantitative and discrete since the data are distinct whole numbers.(b) Categorical since the data are brand names of cereals.(c) Quantitative and continuous since the data can be measured along a continuous scale.

n

Quantitative or numerical data is best displayed in a column graph or line graph.

n

Categorical or qualitative data is best displayed in a sector graph or divided bar graph.

Why do you think this is so?

Idea: Collecting statistical graphs and tables

Idea: Use the Internet to find real data

Think: Which graph is best?

P

!NNC Yr12 maths ch 04 Page 110 Wednesday, October 4, 2000 1:43 AM

Page 3: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS

111

Random sampling

It is not always convenient to collect data from all members of a population—that is, by using a

census

. If a population is too large or too difficult to survey, a

sample

of items can be taken from the population and analysed, and the results used to reflect population characteristics.

n

A

simple random sample

is one where each member of the population is equally likely to be chosen—for example, choosing the winning balls in Lotto.

n

A

systematic sample

is one where the first member is chosen at random and the others are chosen at regular intervals—for example, every 8th toy on a production line.

n

A

stratified sample

is one where a representative sample is taken from each

stratum

or

layer

of a population—for example, a stratified sample from a population containing 70% adults and 30% children would contain 70% adults and 30% children.

1.

State whether the data is (i) categorical, (ii) quantitative and discrete, or (iii) quantitative and continuous in each case.(a) temperature of water in a swimming pool(b) number of people who voted Liberal in the last four elections(c) response time when patient’s reflexes are tested(d) religious denomination(e) breeds of dogs(f) speed of a car(g) number of goals scored in a football match(h) heights of girls in the school athletics team

2.

Give two reasons for choosing a sample rather than a census.

3.

What are

biased

and

unbiased

samples?

4.

Which type of random sample (simple, systematic or stratified) would best suit each of the following situations?(a) random breath testing(b) opinion poll on whether Australia should change the flag(c) taste testing a new brand of soft drink

5.

Answer the questions that follow these three displays.(a)

Exercise 4-01: Collecting and displaying data

Drugs used by Australians

Alcohol

Tobacco

Cannabis

Painkillers

Sleeping pills

Heroin

Amphetamines

Ecstasy

Cocaine

Hallucin

ogens

100

80

60

40

20

0

90

70

50

30

10

Pe

rce

nta

ge

!NNC Yr12 maths ch 04 Page 111 Wednesday, October 4, 2000 1:43 AM

Page 4: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

112

NEW CENTURY MATHS GENERAL: HSC

(b)

(c)

(i) State the type of display and information contained in the display.(ii) Describe the type of data displayed.(iii) Comment briefly on the strengths and weaknesses of the display.

SUMMARY STATISTICS

Measures of location (or averages)

Measures of

location

or

averages

are used to indicate the middle or centre of a data set. There are three measures of location: the mean, median and mode. You should use the measure that is best suited to the type and distribution of the data.

The

mode

is the most popular value or category.

The centre of a

categorical

data set is always described by the mode. For example, the modal dress size is the size worn by more women than any other.

The

mean

is the arithmetic average of all scores:

=

or

=

.

The

median

is the middle score (or average of the two middle scores) when the scores are arranged in ascending order.

The centre of a

quantitative

data set is usually described by the mean or the median.

The mean takes into account all scores in a data set and can be considered as the ‘balance point’ of the data set. It is, however, affected by very large or very small scores. In distributions where there are outliers, it is better to use the median as the measure of location.

Newstart Allowance

Lessthan 20

Morethan 60

21–34 35–54 55–59

250 000

200 000

150 000

100 000

50 000

0

199519961997

Road fatalities in Australia

Mar 95 Sep 96Sep 95 Mar 96 Mar 97 Sep 98Sep 97 Mar 98 Mar 99

3.5

3.0

2.5

2.0

1.5

1.0

0.5

0.0

Fat

aliti

es p

er 1

000

0 po

pula

tion

xΣxn

------ xΣfxΣf--------

!NNC Yr12 maths ch 04 Page 112 Wednesday, October 4, 2000 1:43 AM

Page 5: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS

113

Outliers

An

outlier

is a very high or very low score that is clearly apart from the other scores.

An outlier can occur for a variety of reasons and should always be investigated. If an outlier is found to be a value obtained through incorrect measurement or observation and is not a typical score, it can be excluded. If the outlier is a possible value from the population, it should be included in the distribution.

Here the outlier temperature is 42°C.

Example 2

Find the mean and median of the two data sets and state which is the more appropriate measure of location for each set:

A: 1 2 3 3 4 7 8

B: 1 2 3 3 4 7 29

Solution

A: Mean

=

=

4

Median

=

3

B: Mean

=

=

7

Median

=

3

For set A, either the mean or median could be used as the measure of location.

For set B, the median is the better measure of location as the outlier 29 affects the mean.

36 37 38 39 40 41 42 °C

x1 2 3 3 4 7 8+ + + + + +

7----------------------------------------------------------- The mean can also be found

using the statistics mode of a scientific or graphics calculator.

x1 2 3 3 4 7 29+ + + + + +

7--------------------------------------------------------------

Just for the record

THE CHALLENGER DISASTER

In 1986, the space shuttle Challenger exploded just after takeoff and seven astronauts were killed. It was found that two rubber O-rings had failed because of the low air temperature. Here is some data from previous flights:

An engineer had noticed the outlier (i.e. a damage index of 11 at a temperature of 12°C) and the fact that the expected air temperature at the takeoff time was below 0°C and had recommended that the flight be delayed. Unfortunately, the outlier was not considered important and the flight ended in tragedy.

Can you give another example where an outlier should not be ignored?

Air temperature (°C) 12 14 19 23 26

Damage index 11 4 0 0 0

!NNC Yr12 maths ch 04 Page 113 Wednesday, October 4, 2000 1:43 AM

Page 6: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

114

NEW CENTURY MATHS GENERAL: HSC

Measures of spread

Measures of

dispersion

or

spread

are used to indicate how spread out a data set is. As with measures of location, you should use the measure that is best suited to the type and distribution of data.

Range and interquartile range

Range

= highest score – lowest score

Interquartile range = upper quartile – lower quartile= Q3 – Q1

The interquartile range is the range of the middle 50% of scores. The upper quartile (Q3) is the median of the upper half of the scores and the lower quartile (Q1) is the median of the lower half. The interquartile range is often a better indicator of spread than the range as it does not take extreme scores into account.

Example 3Find the range and interquartile range of the two data sets and state which is the more appropriate measure of spread for each set.

A: 1 2 3 3 4 7 8B: 1 2 3 3 4 7 29

SolutionA: 1 2 3 3 4 7 8

↑ ↑ ↑Q1 Q2 Q3

Range = 8 – 1 = 7Interquartile range = 7 – 2 = 5

B: 1 2 3 3 4 7 29↑ ↑ ↑Q1 Q2 Q3

Range = 29 – 1 = 28

Interquartile range = 7 – 2 = 5

For set A, either the range or interquartile range could be used as the scores are fairly evenly spread.

For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account.

Standard deviationStandard deviation is the most common measure of the spread of a distribution. It is the square root of the average of the squared deviations from the mean.

σn is the standard deviation of a population and σn − 1 is the standard deviation of a sample.

σn − 1 is used to approximate the population standard deviation σn and gets closer to the population value as the sample size increases.

Use σn − 1 if the data is from a sample (or if you are unsure) and σn if all the possible data is given. In either case, always state which standard deviation you are using.

σn − 1 for sample data σn for population data

!NNC Yr12 maths ch 04 Page 114 Wednesday, October 4, 2000 1:43 AM

Page 7: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 115

Mean and standard deviation from a calculatorThe mean and standard deviation of a data set can be calculated using the statistics mode (STAT or SD) of your calculator.

Example 4Here are the net weekly earnings of 8 labourers:

$730 $490 $600 $440 $490 $370 $700 $580(a) What is the mean weekly earning?(b) What is the standard deviation of the earnings?

SolutionClear any previous data and check that n = 0.Enter the separate values: 730 490 600 … 580 Check that you have entered the correct number of scores by checking that n = 8.

(a) Mean = $550

(b) Standard deviation σn − 1 ≈ $125.40

Example 5Find the standard deviation of the two data sets and state which set is the more widely spread.

A: 1 2 3 3 4 7 8B: 1 2 3 3 4 7 29

SolutionA: Sample standard deviation σn − 1 ≈ 2.58

B: Sample standard deviation σn − 1 ≈ 9.88

Set B is more widely spread as it has a much larger standard deviation.

Example 6Twenty possums were captured, tagged and released in the Booderee National Park. Rangers recaptured several samples of 10 possums over a 2-month period and recorded the number of tagged possums in each sample.

(a) What is the mean number of tagged possums per sample (correct to 2 decimal places)?(b) What is the standard deviation of tagged possums (correct to 2 decimal places)?

SolutionClear any previous data and check that n = 0.

Enter the data: 0 8 1 11 2 5 etc.

Check that you have entered the correct number of scores by checking that n = 31.

(a) Mean = 1.48 possums

(b) Standard deviation σn − 1 = 1.36 possums

No. tagged per sample (score) 0 1 2 3 4 5

No. of samples (frequency) 8 11 5 4 2 1

P

DATA DATA DATA DATA

x

σn − 1 is the sample standard deviation.

× DATA × DATA × DATA

x

!NNC Yr12 maths ch 04 Page 115 Wednesday, October 4, 2000 1:43 AM

Page 8: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

116 NEW CENTURY MATHS GENERAL: HSC

Example 7The annual salaries of employees at the Nelson manufacturing company are tabulated.

(a) How many people are employed at the company?(b) Using class centres 45, 55, 65, …, find the estimated mean salary of the employees.(c) What is the standard deviation of the salaries?

Solution

(a) Number of employees = 46.(b) Clear any previous data and check that n = 0.

Enter the data: 25 16 35 12 45 5 etc.

Check that n = 46.Mean = 40.4.Hence the estimated mean salary is $40 400 per annum.

(c) Standard deviation σn = 15.9.

Hence the standard deviation of the salaries is $15 900 per annum.

1. Without using the statistical functions on your calculator, find the mean and median for each data set (correct to 1 decimal place where appropriate). State which is the better measure of location and why.(a) 2 3 3 5 6 8 9(b) 26 22 24 29 21 23 24 22(c) 8 40 38 42 45 29 31 41 30(d) 6 8 11 9 10 8 11 12 6 7

2. Find the range and interquartile range for each data set in question 1 and state which is the better measure of spread and why.

Annual salary (× $1000) Number of employees

20–,30 16

30–,40 12

40–,50 5

50–,60 5

60–,70 6

70–,80 2

Annual salary (× $1000) Class centre No. of employees

20–,3030–,4040–,5050–,6060–,7070–,80

253545556575

16125562

46

20–,30 means from 20 up to but not including 30.

× DATA × DATA × DATA

x

σn is the population standard deviation.

Exercise 4-02: Summary statistics

!NNC Yr12 maths ch 04 Page 116 Wednesday, October 4, 2000 1:43 AM

Page 9: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 117

3. Using your calculator, find the mean and standard deviation (correct to 1 decimal place) for each of these sets of data.(a) 42, 35, 63, 70, 81, 80, 85(b) $300, $400, $600, $440, $300, $700, $250, $580, $260(c) 37.4°F, 38.2°F, 39.0°F, 36.8°F, 38.5°F, 38.0°F, 36.8°F, 40.5°F(d) 165 kg, 146 kg, 178 kg, 190 kg, 158 kg, 147 kg(e) 23, 18, 24, 16, 17, 20, 15, 22, 19

4. The hair colours of 75 people were noted.(a) What is the modal hair colour?(b) Why is the mode the best measure of

central tendency here?(c) Why are the mean and median not

appropriate measures here?

5. Joshua swam a kilometre each morning for 10 days in preparation for a swimming carnival. His times (in minutes) were:

28 24 22 24 25 24 26 26 24 27(a) What is his median swim time?(b) What is his mean swim time?(c) What is his range of swim times?(d) What is the interquartile range of his times?(e) What is the standard deviation of his times (correct to 1 decimal place)?(f) If Joshua asked you to tell him the most appropriate measures of location and spread

for these times, which two would you choose? Justify your answer.

6. Ted and Julie were paid by piecework for making T-shirts. The numbers made each day over an 8-day period were:

Ted: 18 25 19 19 26 24 15 22Julie: 16 20 21 28 12 26 18 19

(a) For each person find:(i) the number of T-shirts made in the 8-day period(ii) the interquartile range of T-shirts made(iii) the mean number of T-shirts made(iv) the standard deviation of T-shirts made (correct to 1 decimal place)

(b) Comment on the statement that Ted is a more consistent worker than Julie by comparing their means and standard deviations. Give reasons for your answer.

7. Numbers of motor accidents per week over a 9-week period at a busy intersection were:4 3 6 0 4 9 2 3 5

(a) What is the median number of accidents?(b) What is the mean number of accidents per week?(c) Does the mean or median best describe the centre of this data set? Give reasons.(d) What is the range of the data?(e) What is the interquartile range?(f) Find the standard deviation for the data (correct to 1 decimal place).(g) Comment on the statement: ‘The number of accidents per week is fairly consistent’.

Justify your answer.

Hair colour No. of people

Brown 16

Blonde 25

Black 14

Red 7

Other 13

!NNC Yr12 maths ch 04 Page 117 Wednesday, October 4, 2000 1:43 AM

Page 10: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

118 NEW CENTURY MATHS GENERAL: HSC

8. This stem plot shows the waiting times in a medical centre (in minutes).(a) Find the mean waiting time (correct to 1 decimal place).(b) Find the standard deviation of waiting times (correct

to 1 decimal place).

9. The weekly wages of a group of teachers are shown in the table.(a) What is the mean weekly wage?(b) What is the standard deviation of the

weekly wages (correct to 1 decimal place)?

10. The percentage marks for 250 students in a Business Studies examination are listed.

(a) What is the mean mark (to the nearest whole number)?(b) What is the standard deviation of the marks (correct to 1 decimal place)?(c) Write a comment to the school principal describing the results of these students.

FEATURES OF A STATISTICAL DISPLAYShapeThe shape of a statistical display shows how the data is distributed. When using a dot plot or histogram, a curve can be used to approximate the general shape.

ClusteringClustering occurs when scores are close together or ‘bunched up’. In this stem-and-leaf plot, the scores are clustered in the 50s and 80s.

Score 11–20 21–30 31–40 41–50 51–60 61–70 71–80 81–90 91–100

No. of students 14 18 15 26 20 31 44 39 43

Stem Leaf

123456

2 4 53 6 90 1 4 5 7 7 82 4 51 3 72

Weekly wage ($) No. of teachers

500–,600 8

600–,700 5

700–,800 30

800–,900 25

900–,1000 12

0 1 2 3 4 5 6 7 8 9 10

Stem Leaf

3456789

1 23 5 90 2 6 7 7 8 94 53 5 60 2 4 6 6 8 8 94 2 9

!NNC Yr12 maths ch 04 Page 118 Wednesday, October 4, 2000 1:43 AM

Page 11: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 119

SymmetryA distribution has symmetry if the scores are balanced or evenly spread about the centre of the distribution.

In a symmetrical distribution, the mean, median and mode are usually the same. For this distribution, the mean median and mode are all 5.

SkewA distribution that is skewed is not symmetrical. The tail indicates the direction of the skew.n If the scores are mostly low (or to the left), the distribution is positively skewed.n If the scores are mostly high (or to the right), the distribution is negatively skewed.

This distribution is positively skewed. The tail points to the right, the positive direction.

The data in this dot plot is negatively skewed. The tail points to the left, the negative direction.

The data in this stem-and-leaf display is negatively skewed as the scores are mostly high with a tail towards the low scores. Clustering also occurs in the 70s and 80s.

Peaks and modesPeaks are the high points or ‘humps’ in a display. The highest peak is called the mode.

No peaks: display is uniform or flat and there is no mode.

One peak: display is unimodal. The mode here is 6.

Two peaks: display is bimodal. The mode here is 6.

Many peaks: display is multimodal . There are three peaks here and twomodes: 5 and 7.

3 4 5 6 7

21 22 23 24 25 26 27 28

Stem Leaf

23456789

2 60 14 82 6 91 3 4 50 2 3 4 4 5 5 7 7 7 8 93 5 7 7 8 8 8 8 92 8

3 4 5 6 7

3 4 5 6 7

3 4 5 6 7 8 9

The mode is the higher of the two peaks.

3 4 5 6 7 8 92 10There are two highest peaks here so there are two modes.

!NNC Yr12 maths ch 04 Page 119 Wednesday, October 4, 2000 1:43 AM

Page 12: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

120 NEW CENTURY MATHS GENERAL: HSC

The three distributions show the relative positions of the mean, median and mode.

n For a symmetrical distribution, the mean, median and mode are usually equal.n For a skewed distribution, the median is usually between the mean and mode and is the

better measure of location.

Diagram (a) could represent results in an HSC General Mathematics examination.

Diagram (b) could represent traffic flow from 6 am to noon.

Diagram (c) could represent the heights of basketball players in a club.

Can you think of other situations that these diagrams could represent?

Think: Shape and measures of location

(a) (b) (c)

MeanMedian

Fre

quen

cy

Score

Mode

MeanMedian

Fre

quen

cy

ScoreMode MeanMedian

Fre

quen

cy

ScoreMode

Symmetrical Positively skewed Negatively skewed

TEN HOT TIPS FOR TACKLING EXAMS

1. Find out about the format of the exam: the topics to be tested, the time allowed, the number and format of questions, the marks awarded, whether formulas are supplied.

2. Be prepared!

3. Spend the first 5 minutes browsing through the exam to see the work that is ahead of you. Note the harder questions—you may need to spend more time on them.

4. Spend the first minute of each question planning and thinking.

5. Keep an eye on the time. Don’t spend too much time on one question.

6. Write clearly. Draw big diagrams. Spread out your working and set it out neatly. Write down the page, not across.

7. Make sure you have answered the question. Did you remember to round off and/or include units? Did you use all of the relevant information given?

8. Attempt every question.

9. If the working-out to a hard question is taking too long, then it’s probably wrong. Don’t get bogged down. If you’re getting nowhere, retrace your steps, start again, or skip the question and return later with a fresh mind.

10. Once you have completed the exam, go over it again. Double-check your answers, especially the harder ones or those of which you’re unsure.

Study tips

!NNC Yr12 maths ch 04 Page 120 Wednesday, October 4, 2000 1:43 AM

Page 13: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 121

1. Draw a curve representing a statistical display that:(a) is symmetrical (b) is positively skewed(c) shows clustering (d) is negatively skewed with clustering(e) is symmetrical and bimodal

2. For each of the following displays state:(i) if the data is symmetrical or skewed(ii) if there are any clusters(iii) if there are any outliers(iv) how many peaks there are

(a) (b)

(c) (d)

(e) (f)

3. The numbers of visits (or hits) to a popular Internet website were tabulated over a 10-hour period.

Draw a histogram to represent this data and comment on the features of the display, such as shape, skew, clustering and peaks.

Time1201–1300

1301–1400

1401–1500

1501–1600

1601–1700

1701–1800

1801–1900

1901–2000

2001–2100

2101–2200

Hits (× 1000) 1.3 0.8 0.4 2.1 2.6 4.5 3.9 5.3 2.3 1.2

Exercise 4-03: Features of a statistical display

1 2 3 4 5 6 7 8 9 10 11Score

0

2

4

6

8

10

12

Fre

quen

cy 4 5 6 7 8 9

Stem Leaf

12345

3 4 6 6 6 7 8 9 90 71 2 2 5 7 8 8 90 2 32 9

11 13 16 18 19 2010 12 14 15 17

5 10 15 20 25 30 35 40 45 50Score

0

1

2

3

4

5

6

Fre

quen

cy

7

8

9Stem Leaf

456678

1 35 5 60 3 55 6 82 65 5 8

!NNC Yr12 maths ch 04 Page 121 Wednesday, October 4, 2000 1:43 AM

Page 14: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

122 NEW CENTURY MATHS GENERAL: HSC

4. For the given information:5 14 8 7 12 3 2 8 4 10 6 2 7 3 9 9 6 4 8 9

(a) draw a dot plot to display the data(b) comment on the features of the display

5. Here is a set of data:22 16 36 15 16 24 15 15 19 55 58 59 18 17 20 20 24 15 54 1915 40 21 17 50 22 23 21 24 23 15 35 15 24 22 19 15 17 43 49

(a) Draw a stem-and-leaf display for this data set using stems 1, 2, 3, 4 and 5.(b) Comment on the features of the display.(c) Give the name of a possible population that this data could represent.

6. This dot plot represents the industrial accidents per month at a factory:

(a) What is the mean number of accidents in this period (correct to 1 decimal place)?(b) What is the standard deviation (correct to 1 decimal place)?(c) What could be a possible reason for the outlier 9?(d) What are the mean and standard deviation if the outlier 9 is not included (correct to

1 decimal place)?(e) Compare the means and standard deviations of the two groups of data.

INVESTIGATING OUTLIERSOutliers often have the effect of raising or lowering a mean value but they can also affect the mode and median.

Example 8A: 20 25 30 35 40 45B: 20 25 30 35 40 60C: 20 25 30 35 40 120

(a) Find the mean and median of each set of scores.(b) The three data sets are the same except for the value of the last score. Investigate the

effect of increasing the last score on the mean and median of set A.(c) What are the values of the mean and median of set C if the outlier 120 is not included?

Solution(a) A: 20 25 30 35 40 45

B: 20 25 30 35 40 60C: 20 25 30 35 40 120

↑Median = 32.5

Set Mean Median

A 32.5 32.5

B 35 32.5

C 45 32.5

1 2 3 4 5 6 70 8 9Accidents/month

!NNC Yr12 maths ch 04 Page 122 Wednesday, October 4, 2000 1:43 AM

Page 15: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 123

(b) Increasing the last score has no effect on the median.As the last score increases, so the value of the mean increases. The outlier of 120 has the greatest effect on the value of the mean.

(c) Set C without the score 120 has a mean and median of 30.

1. For each pair of data sets below find:(i) the mean and median (correct to 1 decimal place)(ii) the value of any outlier score(iii) the effect on the mean and median of any outlier

(a) A: 10 12 14 16 18 20B: 10 12 14 16 18 40

(b) A: 5 37 41 53 56B: 36 37 41 53 56

(c) A: 3 4 8 9 12 14B: 3 6 7 10 13 25

(d) A: 110 120 130 135 135 140 140B: 55 115 135 140 145 145 150

2. For each data set below:(i) find the mean, median and mode (correct to 1 decimal place where needed)(ii) state the value of any outlier(iii) say which measure of location is the most appropriate(iv) sketch the shape

(a) 2 8 3 16 9 26 8(b) 8 16 4 21 4 23 16 12(c) 120 g 85 g 72 g 60 g 80 g 80 g(d) 37°C 38°C 41°C 39°C 38°C 37°C 37°C

3. The 7 employees at the Bug and Beef Cafe earned the following wages in a week:$350 $420 $510 $130 $635 $320 $460

(a) What is the mean wage?(b) What is the median wage?(c) Which is the more appropriate measure of location? Justify your answer.(d) If each employee received a 10% pay rise, what would be the new mean and median

wages?(e) By what percentage would the mean increase?(f) If the manager who earned $635 was not included in the data set, what would be the

mean and median wages?

4. In a netball tournament of 5 matches, the numbers of points scored by three teams are:The Wombats: 24 18 14 6 22The Possums: 16 16 15 18 15The Koalas: 36 8 14 16 12

(a) What are the mean and median for each team?(b) Which team is more consistent? Why?(c) An error was found in the recording for the Wombats. The score of 6 should have

been 16. What are the new mean and median?(d) Which team is more consistent now? Why?

Exercise 4-04: Investigating outliers

!NNC Yr12 maths ch 04 Page 123 Wednesday, October 4, 2000 1:43 AM

Page 16: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

124 NEW CENTURY MATHS GENERAL: HSC

5. Pam and Percy sell photocopiers. The numbers of copiers sold over a 10-week period are shown.

Pam: 1 2 3 3 5 6 7 8 12 25Percy: 3 3 3 14 16 18 18 24 32 35

(a) What is the modal number of copiers sold by each person?(b) What could you say about each person if you only knew the mode?(c) What is the median number of copiers sold by each?(d) What is the mean number of copiers sold by each?(e) Which measure of location is the best measure to compare the sales performances

of Pam and Percy?(f) Who is the better salesperson? Why?

6. Choose 5 scores that have the same mean and median. What effect will adding a score of 100 have on the mean and median?

7. Rupert’s bookstore employs the following people with annual wages as shown:2 store managers $64 3004 cashiers $34 2003 part-time clerical staff $28 50010 salespeople $46 5002 part-time cleaners $13 500

(a) What is the modal wage? Why?(b) What is the median wage?(c) What is the mean wage (to the nearest dollar)?(d) Which measure would Rupert use to make the salaries appear higher?(e) Which measure of location (average) best represents the average wage for an

employee at Rupert’s bookstore?

DISPLAYING AND COMPARING TWO DATA SETSDouble stem-and-leaf plotsBy representing two related data sets in a double (back-to-back) stem-and-leaf display, similarities and differences, such as clustering and averages (measures of location), can be easily seen.

Example 9This double stem-and-leaf plot shows the numbers of dollars spent by a group of students visiting the Easter show.(a) How many students went to the show?(b) Give two observations on the shape and

features of the data.(c) Calculate the mean and standard deviation

(to the nearest 5 cents) of amounts spent by boys and by girls.(d) Considering all the information you have, do you think that boys are the bigger

spenders? Why?

Boys Girls

8 6 6 5 5 46 4 3 2

9 8 25 3 2 1 1 0

2

12345

2 5 5 80 2 4 5 5 5 6 7 8 91 2 40 2

!NNC Yr12 maths ch 04 Page 124 Wednesday, October 4, 2000 1:43 AM

Page 17: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 125

Solution(a) 39 students, consisting of 20 boys and 19 girls.(b) The amounts spent by the girls show clustering at $20–$29, whereas the amounts spent

by the boys are more evenly spread out.The data for the girls is positively skewed.

(c) Girls: Mean = $25.80 Standard deviation σn − 1 = $8.00Boys: Mean = $30.10 Standard deviation σn − 1 = $12.40

(d) Yes. The average amount spent by a boy was $30.10. This was about $6 more than the average amount spent by a girl.

Box plotsWhereas a stem-and-leaf plot gives a good visual comparison of the location of scores in a data set, a box plot (or box-and-whisker plot) shows the spread of the data. Find a five-number summary and draw each box plot on the same scale.

Example 10The box plots below show the ranges of unleaded petrol prices in six cities in Australia.(a) (i) Which city’s petrol prices had the smallest range?

(ii) Which city’s had the largest range?(b) In which city was petrol generally cheapest? Give a possible reason for this.(c) Canberra, Sydney and Melbourne had the same range of prices.

(i) Which of these three cities had the lowest median price?(ii) In which of these cities would you be more likely to pay a higher price for petrol?

(d) Write down one observation about petrol prices in Canberra.

Solution(a) (i) Adelaide (ii) Darwin(b) Brisbane. The government tax on petrol is lower than in the other cities and so the price

paid by the consumer is lower.(c) (i) Sydney (ii) Melbourne(d) They were evenly spread across the city. The distribution of petrol prices is symmetrical.

x Use the statistical function on a scientific or graphics calculator.

x

The box contains the middle 50% of scores with each whisker representing 25% of the remaining scores.

Q1 Q2 Q3

Lower Medianquartile

Upperquartile

Upperextreme

Lowerextreme

Canberra

Sydney

Melbourne

Adelaide

Brisbane

Darwin

!NNC Yr12 maths ch 04 Page 125 Wednesday, October 4, 2000 1:43 AM

Page 18: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

126 NEW CENTURY MATHS GENERAL: HSC

Using a graphics calculator is an easy and excellent way to compare box plots.

1. Enter the individual scores of the first data set in List 1.

2. Enter the individual scores of the second data set in List 2.

3. Set the to a median box plot (some calculators have a mean box plot as well).

4. Make sure that both Graph 1 and Graph 2 are ON.

5. Draw the graphs. Both graphs will appear on the screen at the same time, giving you an excellent comparison of the two data sets.

The calculator will also give you the five-number summary.

Example 11Liz and George deliver pamphlets to letterboxes in the same neighbourhood. The numbers of pamphlets delivered per hour over 12 hours are shown:

Liz: 24 25 26 27 28 28 31 32 32 32 35 35George: 15 18 21 24 25 29 31 31 32 38 38 45

(a) Represent the data in a double stem-and-leaf plot.(b) Find a five-number summary for each data set and hence draw two box plots.(c) Write down one observation that is best seen in the stem-and-leaf plot.(d) Write down one observation that is best seen in the box plots.(e) Which worker showed the greater interquartile range of pamphlets delivered? Which

display shows this the best?(f) Can we conclude that Liz is a better worker than George?

Solution

(b) Liz: 24 25 26 27 28 28 31 32 32 32 35 35↑ ↑ ↑ ↑ ↑

Lower extreme = 24 Lower quartile = = 26.5

Median = = 29.5 Upper quartile = = 32 Upper extreme = 35

George: 15 18 21 24 25 29 31 31 32 38 38 45↑ ↑ ↑ ↑ ↑

Lower extreme = 15 Lower quartile = 22.5Median = 30 Upper quartile = 35 Upper extreme = 45

(a) Liz George

8 8 7 6 5 45 5 2 2 2 1

1234

5 81 4 5 91 1 2 8 85

Technology: Box plots on a graphics calculator

GRAPH

26 27+2

------------------

28 31+2

------------------ 32 32+2

------------------

15 20 25 30Pamphlets/hour

35 40 45

George

Liz

!NNC Yr12 maths ch 04 Page 126 Wednesday, October 4, 2000 1:43 AM

Page 19: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 127

(c) The stem-and-leaf plot shows that the number of pamphlets delivered per hour by Liz was always in the 20s and 30s.

(d) The box plots show the median number of pamphlets delivered per hour by both was about the same (around 30) but George’s range was greater.

(e) George. This is obvious from the box plots. The interquartile range is the length of ‘the box’.

(f) If an employer was looking for consistency, Liz is the more consistent worker as she had less variation in the number of pamphlets delivered per hour. However, for the total number of pamphlets delivered, both employees delivered approximately the same number of pamphlets. We cannot conclude that Liz is a better worker than George.

What to do with outliers?n If an outlier is considered to be feasible, you can include it in the whiskers.n If an outlier is considered to be an error, you need not include it in the whiskers but can

represent it as a separate point.

Can you describe a situation that these box plots could represent?

1. The numbers of dollars spent by a class of students visiting the Easter show were discussed in Example 9 (page 124).(a) Find a five-figure summary for each data set.(b) What is the interquartile range of each?(c) Draw two box plots representing the data sets.(d) What information is seen more easily in the box plots?

2. A teacher proposes that ‘People always underestimate the length of a piece of string’. A group of students decide to investigate this theory. They each estimate the lengths of several pieces of string and then measure the actual lengths.

(a) Write down the median of the estimated lengths.(b) Write down the median of the actual lengths.(c) What are the range and interquartile range for each data set?(d) Would you agree with the teacher’s theory? Justify your answer.

Think: Is the outlier in or out?

1 3 5 7 9 112 4 6 8 10

Outlierexcluded

Outlierincluded

Exercise 4-05: Displaying and comparing two data sets

Boys Girls

8 6 6 5 5 46 4 3 2

9 8 25 3 2 1 1 0

2

12345

2 5 5 80 2 4 5 5 5 6 7 8 91 2 40 2

5 10 15 20Length of string (cm)

25 30 35

Actual

Estimates

40

!NNC Yr12 maths ch 04 Page 127 Wednesday, October 4, 2000 1:43 AM

Page 20: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

128 NEW CENTURY MATHS GENERAL: HSC

3. Here are two sets of scores represented in a stem-and-leafdisplay.(a) Find the range and interquartile range of each set.(b) Find the median for each set.(c) Draw box plots representing the data sets.(d) Write down one observation from the stem-and-leaf

plot and one from the box plots.

4. The pulse rates (in beats/minute) of two groups of people were recorded:Group X: 77 72 80 77 91 62 72 82 79 58 75 67 69 66 98 81Group Y: 81 86 64 74 92 75 73 81 64 52 82 79 80 53 62 78

(a) Draw a back-to-back stem-and-leaf plot. (b) What is the mean of each group (correct to 1 decimal place)?(c) What is the median of each group?(d) Which is the better measure of location? Why?(e) Comment on the shape of each group in the stem-and-leaf plot.

5. A group of 20 people had their pulse rates taken before and after an exercise class.

(a) By how much did the median pulse rate increase?(b) The lower extreme ‘before’ and ‘after’ the class did not change. Give a possible

reason for this.(c) Give a possible reason for the outlier pulse rates in the ‘after exercise’ box plot.(d) How many people had a pulse rate between 64 and 72 before the exercise class?(e) What was the interquartile range of pulse rates after the class?

6. Eighteen people took part in the QUIT smoking program. The numbers of cigarettes smoked per day were recorded before the start of the program and 6 weeks later:Before: 21 10 36 42 16 23 32 42 9 14 21 18 34 45 12 18 16 286 weeks later: 6 24 31 38 21 25 16 19 16 18 28 32 8 13 40 38 16 28(a) What is the interquartile range for each data set?(b) Draw two box plots on the same scale showing ‘before’ and ‘6 weeks later’.(c) Is the QUIT program working for these people? Justify your answer.

7. The following data shows the average number of rainy days per month for two capital cities, and is supplied by the Bureau of Meteorology.

Month J F M A M J J A S O N D

Sydney 12 12 13 12 12 12 10 10 10 12 11 12

Melbourne 8 7 9 12 14 14 15 16 15 14 12 11

Set A Set B

255

8

5702

0123456789

3

852 4 72

4

40 60 80 100Pulse rate (beats/min)

120 140

Before

50 70 90 110 130

exercise

Afterexercise

!NNC Yr12 maths ch 04 Page 128 Wednesday, October 4, 2000 1:43 AM

Page 21: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 129

(a) Use a double stem-and-leaf plot to display the data.(b) Draw box plots representing the data.(c) Write down one observation from each display.(d) ‘Melbourne is much wetter than Sydney.’ Do you agree with this statement? Justify

your answer.

8. This display represents the lifetime in hours of two brands of light globes.

(a) How many of each brand of light globe were tested?(b) What is the mean lifetime of ‘Oso Bright’ globes (correct to 1 decimal place)?(c) What is the mean lifetime of ‘Brighta Longa’ globes (correct to 1 decimal place)?(d) Find the standard deviation of the lifetime of each brand (correct to 1 decimal place).(e) Draw box plots representing the data sets.(f) Which brand of globe would you say is better? Explain your answer.

COMPARING DATA SETS USING CHARTSRadar chartA radar chart is used to plot changes over a certain period or cycle, such as temperarure during a 24-hour period, but it is also useful for comparing two sets of data.

A radar plotting chart (or polar graph paper) can be used to manually plot data, but the best option is to generate the radar chart from a spreadsheet package on a computer.

Example 12This radar chart shows air pollution levels at two different workplaces over a 10-day period.(a) What was the air pollution level at the

meatworks on day 10?(b) What was the air pollution level at the oil

refinery on day 1?(c) On what days was the pollution level above

50 at the oil refinery?(d) What were the maximum and minimum

pollution levels? When and where did they occur?

(e) By comparing the areas contained within each graph, decide which workplace had the higher overall pollution level.

Oso Bright Brighta Longa

6 5 5 4 28 7 7 7 7 7 7 4 4 3

9 9 8 8 7 6 6 6 5 4 4 08 8 8 7 7 7 6 5 4 3 1

9 8 8 8 5 5 2 27 7 5 1

101112131415

3 4 52 3 3 4 4 5 61 2 2 3 3 4 5 5 7 9 9 90 2 3 3 4 4 4 5 6 6 8 8 91 2 2 3 5 5 6 7 8 80 3 3 4 6

Air pollution levelsDay 1

Day 2

Day 3

Day 4

Day 5

Day 6

Day 7

Day 8

Day 9

Day 10

02040

8060

100

MeatworksOil refinery

!NNC Yr12 maths ch 04 Page 129 Wednesday, October 4, 2000 1:43 AM

Page 22: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

130 NEW CENTURY MATHS GENERAL: HSC

Solution(a) About 60.(b) About 45.(c) Days 4, 6, 8 and 9.(d) The maximum level was about 85

on day 4 at the oil refinery and the minimum level was about 25 on day 1 at the meatworks.

(e) The oil refinery graph seems to cover a slightly larger area and so had a higher level of pollution over the 10-day period.

Area chartAn area chart consists of different ‘areas’ or ‘bands’, each representing a data set over a given period of time. It shows the sum of the data over the given time as well as the relationship of the parts to a whole. Its main feature is to emphasise changes during this time. An area chart can be plotted on graph paper or drawn using the Chart option in a spreadsheet package. There are several chart subtypes that you can investigate.

Example 13The table shows the numbers of males and females in full-time employment in January from 1990 to 2000.

Construct an area chart showing the contribution of male and female employees to Australia’s full-time workforce.

SolutionStep 1 Draw a line graph for males using the values in the table and shade below it. This

area represents the male employees.

Step 2 Draw a line graph for total employees by adding the values for females to those of males. Shade the area between the two lines. This area represents the female employees.

Year 1990 1992 1994 1996 1998 2000

Males (×10 000) 350 160 200 320 360 450

Females (×10 000) 80 50 60 120 140 200

For example, in January 2000 the full-time workforce was 6 500 000, and this was made up of 4 500 000 males and 2 000 000 females.

Australia’s full-time workforce

No.

of e

mpl

oyee

s

700

600

500

400

300

200

100

0

Year1990 1992 1994 1996 1998 2000

FemalesMales

(×10

000)

!NNC Yr12 maths ch 04 Page 130 Wednesday, October 4, 2000 1:43 AM

Page 23: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 131

Example 14This area chart compares the unemployment rates for males and females from 1981 to 1997.(a) For the year 1985 find:

(i) the unemployment rate for males(ii) the combined unemployment rate(iii) the unemployment rate for females

(b) What was the unemployment rate in 1993?(c) What trends in the unemployment rate can

be seen over the period from 1981 to 1997?

Solution(a) (i) About 8%.

(ii) About 17%.(iii) About 9% (subtract the 8% rate for males from the 17% total rate).

(b) About 22%.(c) The unemployment rate rose from about 12% in 1981 to 17% in 1997.

A fall in the unemployment rate occurred from 1985 to 1989 followed by a rise before another fall from 1993 to 1997. The unemployment rate was at its highest in 1993.

Radar charts and area charts are drawn in a similar way using a spreadsheet package. Use a spreadsheet to draw the area chart for Australia’s full-time workforce (Example 13 on page 130).

1. The numbers of clear days for the ski resorts of Thredbo and Perisher in the Snowy Mountains area of NSW are shown in the radar chart.(a) How many clear days did Thredbo have

in March?(b) What was the most number of clear

days at either resort? When was this?(c) How many days were not clear in

Perisher in July?(d) Which data set contains the largest

area? What does this area refer to?(e) ‘The weather is better for skiing at

Perisher.’ Do you agree with this statement? Justify your answer.

Unemployment rates

Per

cent

age

25

20

15

10

5

0

Year1981 1985 1989 1993 1997

FemalesMales

Use your ruler to help you measure the vertical distances.

Technology: Using a spreadsheet to draw an area chart or radar chart

Exercise 4-06: Comparing data sets using charts

Clear days in the ski fields of NSWJan

Feb

Mar

Apr

Jun

Jul

Aug

Oct

Nov

Dec

0

4

8

2

6

10

MaySep

ThredboPerisher

!NNC Yr12 maths ch 04 Page 131 Wednesday, October 4, 2000 1:43 AM

Page 24: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

132 NEW CENTURY MATHS GENERAL: HSC

2. The area chart shows the number of wage earners employed in the public and private sectors in Australia over different years.(a) How many wage earners were

there in the public sector in 1997?(b) What was the total number of

wage earners in 1993?(c) How many wage earners were

employed in the private sector in 1991?

(d) What trends can be seen over the period from 1991 to 1997?

(e) What similarities or differences can be seen between the public and private sectors?

3. The area chart shows the seasonal rainfall for an island group in the Pacific Ocean.(a) What was the rainfall for the

southeastern region in summer?(b) What was the rainfall for the

northern region in spring?(c) What was the total rainfall in

autumn?(d) The southeast is the wettest region.

How is this shown in the graph? What could be a possible reason for one area getting more rain than the others?

(e) What trends in the rainfall can be seen over the year?

(f) What similarities or differences in rainfall can be seen between the regions?

4. Mr Pappadopoulos was admitted to hospital with a suspected stomach ulcer. His fluid intake (e.g. water and medicine) and output (e.g. urine) over a 24-hour period are summarised in the following table.

(a) Represent the data in a radar chart.(b) By considering the areas enclosed by each data set, what observation can you make

about Mr Pappadopoulos’s intake and output over the 24-hour period?(c) Write down two other observations from your radar chart.

Time 6 am 8 am 10 am 12 noon 2 pm 4 pm

Intake (mL) 170 240 150 110 250 90

Output (mL) 140 150 80 180 130 90

Time 6 pm 8 pm 10 pm 12 pm 2 am 4 am

Intake (mL) 150 60 180 170 160 210

Output (mL) 60 220 110 160 100 140

Wage earners in Australia8000

7000

6000

5000

3000

2000

Year1991 1992 1993 1994 1995

Private sectorPublic sector

1996 1997

1000

4000

0No.

of w

age

earn

ers

(×10

00)

Seasonal rainfall for island group400

350

300

250

150

100

SeasonSummer Autumn Winter

Southwestern regionSoutheastern regionNorthern region

Spring

50

200

0

()

!NNC Yr12 maths ch 04 Page 132 Wednesday, October 4, 2000 1:43 AM

Page 25: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 133

5. Clark and Lois earn extra money for writing articles for newspapers and magazines. They save these amounts in a joint holiday fund. Their monthly earnings last year are shown in the table.

(a) Represent the data in a radar chart.(b) Represent the data in an area chart.(c) What information is best seen in the radar chart?(d) What trends are clearly seen in the area chart?

6. (a) What information is contained in the graph?

(b) How do you think data for the years 2021–2041 was obtained?

(c) Describe the features of the part of the graph for the 15–59 age group.

(d) In 1961, approximately what percentage of the population was between (i) 0 and 14 (ii) 15 and 59?

(e) Approximately what percentage of the population is expected to be over 60 in 2021?

(f) Give two facts about Australia’s population that can be seen in the graph.

(g) What does this area chart show about age groups in the future?

TWO-WAY TABLESTwo-way tables are used to compare two characteristics—for example, gender and health.

Example 15A National Health Survey in 1995 compared the number of adults in a population who exercised regularly to those who didn’t. The data is displayed in a two-way table.

(a) How many people were surveyed?(b) What percentage of the people surveyed were female? Give your answer correct to

1 decimal place.(c) What percentage of females exercised regularly?(d) What percentage of the population did not exercise regularly?(e) Comment on the statement ‘Men and women are similar in their exercise habits’.

Month Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec

Clark’s earnings ($)

370 240 530 570 780 1030 770 620 790 520 430 490

Lois’s earnings ($)

150 420 480 530 850 1280 920 650 810 480 390 350

Exercise No exercise

Male 3028 1532

Female 1804 946

Australia’s population by age groups1009080

60

4030

Year1921 1961 2001 2041

10

50

0

Per

cent

age

20211941 1981

Age 60+Age 15–59Age 0–14

20

70

!NNC Yr12 maths ch 04 Page 133 Wednesday, October 4, 2000 1:43 AM

Page 26: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

134 NEW CENTURY MATHS GENERAL: HSC

Solution(a) Number of people surveyed = 3028 + 1532 + 1804 + 946 = 7310(b) Number of females = 1804 + 946 = 2750

Percentage of people who were female = × 100 = 37.6%

(c) Percentage of females who exercised = × 100 = 65.6%

(d) Percentage of people who did not exercise = × 100 = 33.9%

(e) Number of males = 3028 + 1532 = 4560

Percentage of males who exercised = × 100 = 66.4%

Since the percentage of females who exercised was 65.6% and the percentage for males was 66.4%, there is no significant difference, so the statement is supported by this data.

1. The population of a town was surveyed in 1990 and 1997 to find out who had private health insurance.

(a) What was the population of the town in 1990?(b) What was the population of the town in 1997?(c) What percentage of the town had private health insurance in 1990?(d) What percentage of the town had private health insurance in 1997?(e) Suggest a reason for the decrease in the percentage of people with private health

insurance.

2. The percentages of Australians living in rural areas in 1911 and 1996 were compared.

(a) Copy and complete the table.(b) What percentage of Australians lived in urban areas in 1911?(c) Comment on the differences between 1911 and 1996.

3. In one area there are three phone companies providing a service for mobile phones. The number of people using each company as a provider was recorded over a 3-year period.

1990 1997

Private 4563 4048

No private 5577 8602

1911 1996

Rural areas 43%

Urban (city) areas 87%

Telstra Optus Vodaphone

1995 204 695 194 198 125 967

1996 315 144 216 276 86 510

1997 402 628 304 025 115 037

27507310------------

18042750------------

1532 946+7310

---------------------------

30284560------------

Exercise 4-07: Two-way tables

!NNC Yr12 maths ch 04 Page 134 Wednesday, October 4, 2000 1:43 AM

Page 27: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 135

(a) How many people in this area owned mobile phones in (i) 1995 and (ii) 1997?(b) What percentage of people used Telstra as their provider in 1996?(c) What percentage of people used Optus as their provider in 1997?(d) What share of the market did Vodaphone have in (i) 1995 and (ii) 1997?(e) What happened to Telstra’s share of the market from 1995 to 1997?(f) What happened to Optus’s share of the market from 1995 to 1997?(g) Comment on the statement ‘Telstra users doubled from 1995 to 1997’.

4. A survey was taken on whether to change the Australian flag or not. The results are shown in the table, grouped by age in years.

(a) How many people surveyed voted to (i) change the flag and (ii) keep the flag?(b) What percentage of those surveyed wanted to keep the flag?(c) What percentage of 18–24-year-olds wanted to change the flag?(d) Which group was most definite in its response? What was this response? Why do

you think this is so?

18–24 25–39 40–54 55–69

Change the flag 790 640 450 140

Keep the flag 1240 860 930 620

TEN MORE HOT TIPS FOR TACKLING EXAMS

1. Bring all of your equipment: pens, paper, geometrical instruments, calculator (check calculator works).

2. Don’t worry if you feel nervous before an exam. This is normal and helps you perform better. However, being too casual or too anxious can be harmful to your performance.

3. Write in black or blue, not red. Don’t use liquid paper. Use pencil only for diagrams and constructions.

4. Read each question and identify what needs to be found.

5. You don’t need to be writing all of the time. What you are writing may be wrong and a waste of time. Spend some time thinking and considering the best approach.

6. Make sure your answer sounds reasonable and realistic, especially if it involves money or measurement.

7. If you make a mistake, cross it out with a neat line. Don’t scribble over it completely. You may still get marks for it if it is right. Don’t use liquid paper. It is both time-consuming and messy.

8. Don’t cross out or change an answer rashly. You may have been right the first time.

9. Don’t round off in the middle of a calculation. Round off at the end only.

10. Don’t be afraid to write words and sentences in your working, but don’t use abbreviations that you’ve just made up.

Study tips

!NNC Yr12 maths ch 04 Page 135 Wednesday, October 4, 2000 1:43 AM

Page 28: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

136 NEW CENTURY MATHS GENERAL: HSC

USING MULTIPLE DISPLAYS TO COMPARE DATA SETSRelationships between data sets can often be interpreted and described more effectively by using more than one display. Looking at a variety of different displays allows a better comparison of data sets as some features are more obvious in one display than in another.

Every day in the media you will find examples of multiple displays describing data sets.

A company director compares this year’s figures with those of previous years. Medical researchers compare the effects of a new drug on men and women for similarities and differences. Local councils investigate the population mix in a new suburban area in order to provide the most appropriate facilities.

Let us start with two simple data sets and look at three different ways of comparing them.

Example 16The data sets A and B are displayed as lists, dot plots, a frequency table and a clustered column graph.

ListsA: 5 6 7 8 9 B: 5 5 7 9 9Dot plots

Frequency table Column graph

(a) Comment on the shape and features of each data set.(b) Find the mean, median and mode for each set.(c) Find the range, interquartile range and standard deviation of each set.(d) Comment on the benefits of using multiple displays to describe the data sets and to find

measures of location and spread.

Solution(a) Set A is symmetrical and flat.

Set B is symmetrical and has two peaks; that is, it is bimodal.(b) Set A: Mean = 7 Median = 7 No mode

Set B: Mean = 7 Median = 7 Mode = 5, 9(c) Set A: Range = 4 Interquartile range = 3 Standard deviation σn − 1 = 1.58

Set B: Range = 4 Interquartile range = 4 Standard deviation σn − 1 = 2

(d) Multiple displays cater for differences in people’s preferences as well as allowing for different statistical needs. The dot plots and histogram give good visual representations of the data sets and are best used to describe the shape and features of the data sets. The measures of location and spread are best found from the lists or frequency table, although the other displays can also be used.

ScoreFrequency

Set A Set B

56789

11111

20102

A

5 6 7 8 9Score

5 6 7 8 9Score

B

5 6 7 8 9Score

0

1

2

3

Fre

quen

cy

Set ASet B

!NNC Yr12 maths ch 04 Page 136 Wednesday, October 4, 2000 1:43 AM

Page 29: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 137

1. Two groups, each containing 15 people, were given a small timer and asked to stop the timer when they thought 60 seconds had elapsed. The results, in seconds, for the ‘estimated minute’ are listed:

Group A: 34 43 45 50 62 64 65 65 66 68 69 70 71 75 81

Group B: 42 46 48 48 49 50 55 58 60 61 62 64 65 68 70(a) Construct a double stem-and-leaf plot.(b) Draw a clustered column graph with classes 30–39, 40–49, …(c) Draw box plots to represent the data sets.(d) Write down one piece of information that is clearly shown in each of the three

displays you have drawn.(e) Find the mean and standard deviation of each data set (correct to 1 decimal place).(f) Comment on the ability of each group to estimate a minute.

2. A coach, deciding which team should win the ‘most consistent players’ award, compared the season’s scores for two netball teams:

The Birds: 55 23 35 51 56 48 70 52 64 72

The Bees: 18 41 23 46 48 24 56 27 36 48(a) Display the data in a stem-and-leaf plot, box plots and a column graph.(b) Use your displays to describe the shape and features of each data set.(c) By finding suitable measures of location and spread, decide which team is more

consistent. Justify your answer.

3. The populations of two regions were surveyed to find out who belongs to a workers’ union. The results are tabulated and shown in a back-to-back histogram.

Table

Back-to-back histogram

(a) Write down two comparisons you can make between the two data sets.(b) Use the information to comment on the statement ‘People in the eastern region are

more likely to join a union’. Justify your answer.

Age 15–24 25–34 35–44 45–54 55–64 65+

Eastern region 35% 49% 54% 51% 62% 11%

Western region 34% 36% 38% 42% 45% 4%

Exercise 4-08: Using multiple displays to compare data sets

Union membership by age and region

30 20 10 0 0 10 20% belonging to a workers’ union

30

15–24

45–54

40 40506070 706050

25–3435–44

55–6465+

Eastern Western

!NNC Yr12 maths ch 04 Page 137 Wednesday, October 4, 2000 1:43 AM

Page 30: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

138 NEW CENTURY MATHS GENERAL: HSC

4. The heights of a group of men and women were measured to the nearest centimetre. The data was then represented in a double stem-and-leaf display and also as box plots.

Stem-and-leaf

Box plots

(a) What information is better shown in the stem-and-leaf display?(b) What information is better shown in the box plots?(c) What are the medians and interquartile ranges of the heights of men and women?(d) Calculate the means and standard deviations of the heights of men and women

(correct to 1 decimal place).(e) Write down two similarities between the heights of men and women.(f) Write down two differences between the heights of men and women.

5. The table below gives the average number of rainy days per month for the Australian capital cities.

(a) Draw at least two suitable displays illustrating the data.(b) Calculate the mean and median number of rainy days for each city.(c) Find the range and standard deviation of the number of rainy days for each city.(d) Use these statistical measures and displays to determine:

(i) which city is driest(ii) which city is wettest(iii) which city has the most consistent pattern of rainy days(iv) which city has most variation in the number of rainy days per month

Men Women

89 7 7 5 2

9 9 8 8 6 5 5 4 4 4 2 18 6 3 2

4

1516171819

2 4 4 5 6 8 8 90 2 3 3 4 5 5 5 5 6 7 8 82 3 4 43

CityMonth

J F M A M J J A S O N D

AdelaideBrisbaneCanberraDarwinHobartMelbournePerthSydney

5137

211183

12

4137

201073

12

6157

191194

13

91189

12128

12

141092

14141312

13891

14141712

177

100

15151810

167

121

15161610

137

102

15151310

119

116

16141011

810101214127

11

7128

1613114

12

145 155 165 175Height (cm)

185 195

Men

150 160 170 180 190

Women

!NNC Yr12 maths ch 04 Page 138 Wednesday, October 4, 2000 1:43 AM

Page 31: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 139

6. Use the table in question 5 to consider the rainfall per season in Australia. The seasons are summer (D, J, F), autumn (M, A, M), winter (J, J, A) and spring (S, O, N).(a) Draw at least two suitable displays to illustrate the data.(b) Calculate the mean, median, range and standard deviation for each season.(c) Use these statistical measures and displays to determine:

(i) which is the wettest season (ii) which is the driest season(d) Comment on the statement ‘Rainfall in Australia does not vary much between

seasons’.

Just for the record

BABY BOOMERS

After World War II finished in 1945, there was a ‘baby boom’ in Australia, New Zealand,Britain and North America. This rapid growth in the number of babies born lasted until themid-1960s. People born during this time are referred to as ‘baby boomers’. The result ofthe large increase in births during this period will affect Australia’s population statistics asthis group of people age. The two graphs show the baby boomer population moving from2001 to 2031.

In 2031, the baby boomers will be over 65 years. Approximately how many more personsaged over 65 will there be in 2031 compared with 2001?

Age distribution of Australian population

2001

0–5 6–10 11–15 16–20 21–25 26–30 31–35 36–40 41–45 46–500

200

400

600

800

1000

1200

(× 1

000)

1400

1600

51–55 56–60 61–65 66–70 71–75 76–80 81–85 86+

Baby boomers

2031

0–5 6–10 11–15 16–20 21–25 26–30 31–35 36–40 41–45 46–500

200

400

600

800

1000

1200

1400

1600

51–55 56–60 61–65 66–70 71–75 76–80 81–85 86+

Baby boomers

(× 1

000)

!NNC Yr12 maths ch 04 Page 139 Wednesday, October 4, 2000 1:43 AM

Page 32: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

140 NEW CENTURY MATHS GENERAL: HSC

One of the main roles of a statistician is to critically analyse related data sets and report on the findings. Businesses often use the results of an analysis for promotional purposes and companies report to their shareholders.

To critically analyse data sets:n Draw suitable displays.n Find measures of location and spread.n Write a report on the relationship between the data sets, commenting on any similarities

and differences between the data sets, unusual features, outliers or patterns.n Draw conclusions and make recommendations.

1. Twenty overweight people enrolled in a weight loss program at Rhonda’s Weight Loss Centre. Their weights (in kilograms) before and after the program were:

Before: 128 159 85 76 93 125 102 74 88 8297 84 106 125 76 80 92 77 115 102

After: 75 72 64 95 58 62 120 93 85 72102 65 73 62 56 60 105 82 52 64

Critically analyse the data and report back to Rhonda on how she can best advertise the success of her centre.

2. The times taken (in seconds) to check a basket of 20 grocery items at 15 automated and 15 manual checkouts were:

Automated: 45 58 63 43 75 69 84 65 96 73 90 61 84 72 96

Manual: 95 105 82 110 125 148 136 137 86 99 145 119 101 97 124

Critically analyse the data and report back to the manager of a store on the benefits of installing automated checkouts based on this data.

Obtain published data from the media or Internet, collect data through experiment or simulation, or use data already collected for your statistics file. Critically analyse the data sets by drawing appropriate graphs and tables, determining measures of location and spread, and writing a report on your findings.

Some suggested data sets are:n the performances of two sporting teams (e.g. football or netball) in a seasonn the performance of a sporting team in home and away matchesn pulse rates of males and females before and after exercisen spending patterns of men and womenn heights and weights of males and femalesn scores in two subject testsn waiting times at a checkout on different daysn pollution levels at different times in the same city or in two different citiesn rainfall in two different towns or regionsn part-time incomes of male and female students.

Modelling activity: Analysing data sets

Investigation: Collecting and analysing data sets

!NNC Yr12 maths ch 04 Page 140 Wednesday, October 4, 2000 1:43 AM

Page 33: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 141

A population pyramid displays information about the ages of a population. The oldest age group is at the top and hence the display resembles a pyramid. A simple population pyramid (or back-to-back histogram) is shown in question 3 of Exercise 4-08 (page 137).

1. This population pyramid shows a profile of the Australian population from 1911 to 2051. It is actually three pyramids together, showing the years 1911, 1996 and the population projection for 2051.

(a) Compare the numbers of males and females over 60 in 1911 and in 2051. (b) How many females were 35 in 1996?(c) How many males were 20 in 1911?(d) Find one age group where there are more males.(e) Find one age group where there are more females.(f) Write down three differences between the population in 1911 and in 1996.

2. Investigate the age of the Aboriginal and Torres Strait Islander population and compare with the general Australian population using a population pyramid. You can find the necessary information at the following website: www.abs.gov.au.

Investigation: Population pyramids

100+

Profile of Australia’s population, 1911–2051Males Females

Thousand0 50 100 150 20050 0100150200

95

90

85

80

75

70

65

60

55

50

45

40

35

30

25

20

15

10

5

0

191119962051

Age

!NNC Yr12 maths ch 04 Page 141 Wednesday, October 4, 2000 1:43 AM

Page 34: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

142 NEW CENTURY MATHS GENERAL: HSC

Chapter review

Statistical distributions1. Collecting and displaying data2. Summary statistics3. Features of a statistical display4. Investigating outliers5. Displaying and comparing two data sets6. Comparing data sets using charts 7. Two-way tables8. Using multiple displays to compare data sets

This chapter, Statistical distributions, revises and extends the statistics covered in the Preliminary Course. It compares two data sets in a variety of displays, including double stem-and-leaf plots, box plots, radar charts and area charts. You also used measures of location and spread to compare data sets and learned how to interpret information from different displays. Be sure to include area charts and the effect of outliers in your summary. You could also include a glossary of statistical terms.

Make a summary of this topic. Use the chapter outline above as a guide. An incomplete mind map has also been started below. Use your own words, symbols, diagrams, boxes and reminders. Use the questions in Your say below to think about your understanding of the topic. Gain a ‘whole picture’ view of the topic and identify any weak areas.

Topic summary

Statistical distributions

Area charts

Two-way tables

Stem-and-leaf plots

Radar charts

Box plotsOutliers

Comparing data sets

Summary statistics

Measures of spread

Measures of location

!NNC Yr12 maths ch 04 Page 142 Wednesday, October 4, 2000 1:43 AM

Page 35: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 143

n Have you satisfied the outcomes listed at the front of this chapter?n What was the most important thing that you learned?n How did you feel about the topic? Did you enjoy it?n What was new?n What are your weaknesses? What will you need to study more?n How will you revise and summarise this topic?

1. Classify the data as (i) quantitative and discrete, (ii) quantitative and continuous, or (iii) categorical.(a) numbers of cows on farms in NSW(b) numbers of letters delivered each day to households in Campbelltown(c) annual water consumption in Sydney(d) numbers of workers who travel to work by public transport(e) ages of first-year university students(f) favourite movie

2. Find the mean, median and mode for each data set and suggest a possible population from which each set of data was taken.(a) 10 11 11 12 12 12 13 13(b) 3 3 3 4 4 4 5 5 5(c) 72 72 73 75 76 83 84 85 87 94

3. Consider the set of scores: 3 4 5 5 8 9 12 15 18 20(a) What is the mean?(b) What is the median?(c) Without doing any calculations, say what the effect on the mean and median

would be of adding:(i) one score of 30 (ii) one score of 50

(iii) a score of zero (iv) a score of 10(d) What would be the effect on the mean and median if each score was:

(i) increased by 2? (ii) decreased by 3?

4. For each statistical display below:(i) find the mean and standard deviation of the data set (to 1 decimal place)(ii) describe the shape and features of the distribution

Your say: Reflecting about the topic ● ● ● ●

Chapter assignment

(a)

5

10

15

Fre

quen

cy

6 10 14 18Wages from part-time job (× $10)

8 12 16 200

!NNC Yr12 maths ch 04 Page 143 Wednesday, October 4, 2000 1:43 AM

Page 36: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

144 NEW CENTURY MATHS GENERAL: HSC

5. Match the box plots to the following data sets.

(a) a random sample of 30 spectators at a football match(b) a group of 30 senior citizens on a bus trip(c) a group of 30 dancers at a nightclub(d) two teachers taking a group of 30 primary students to the zoo

6. A factory produces small metal rods, designed to have a mass of 50 g. Samples were taken from two different machines and compared.(a) Find the mean and standard

deviation for each machine (correct to 1 decimal place).

(b) What are the median and interquartile range for machine A?

(c) What are the median and interquartile range for machine B?

(d) Construct box plots for the two data sets.(e) Comment on the statement ‘Machine B produces rods of a more consistent mass

than machine A’.

7. This back-to-back stem-and-leaf plot compares the maximum average monthly temperatures (°C) for two towns in NSW.(a) What was the highest average monthly

temperature for Grafton?(b) What was the range of temperatures for

each town?(c) What was the median temperature for each town?(d) Name two features of the data sets that differ.

1 3 5 7Hours spent doing homework per day

92 4 6 8

(c)

0

5

10

15

Fre

quen

cy

(b)

3 4 5 6 7 8 92 10No. of overseas trips

10 20 40 60Age (years)

80 9030 50 700

A.

B.

C.

D.

Machine A Machine B

9 9 9 9 93 2 2 1 1 0 0 0

8 8 7 6 6 50

445566

48 90 0 0 0 0 1 1 1 2 3 3 3 4 45 6 7 7 8 91 2 35

Goulburn Grafton

9 6 6 3 2 17 6 5 4 2 0

123

0 0 1 2 3 4 6 6 8 8 90

!NNC Yr12 maths ch 04 Page 144 Wednesday, October 4, 2000 1:43 AM

Page 37: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

STATISTICAL DISTRIBUTIONS 145

8. The monthly rainfall (in millimetres) for two areas in Australia for July to December1999 is given in the table.

(a) Represent the data in an area chart.(b) Write down two observations about the rainfall in 1999.(c) Give one difference between area 1 and area 2.

9.

(a) State the type of display and what information is being displayed.(b) Is the data quantitative or categorical? Justify your answer.(c) In which occupations were more females employed than males in 1996?(d) Which occupation had the biggest gender difference?(e) Comment briefly on the strengths and weaknesses of the display.

10. This population pyramid (back-to-back histogram) shows Australia’s population in 1995 by age and gender.

(a) What was the female population in the 30–39 age group?(b) What was the population of males in the 10–19 age group?(c) How many people were aged 60 and over?(d) Does the graph support the statement ‘Women live longer than men’? Justify your

answer.

Month Jul Aug Sep Oct Nov Dec

Area 1 2 165 60 92 160 94

Area 2 23 11 14 2 5 6

Employed persons by occupation and sex, Australia, 1996

(× 1000)0 100

Labourers and related

Elementary clerical, sales and service

Intermediate production and transport

Intermediate clerical, sales and service

Advanced clerical and service persons

Tradespersons and related

Associate professionals

Professionals

Managers and administrators

200 300 400 500 600 700 800 900

FemalesMales

Australia’s population by age and gender, 1995

500 0 0 500Population (× 1000)

1000

0–9

30–39

10001500 1500

10–1920–29

40–49

70+Male Female

60–6950–59

!NNC Yr12 maths ch 04 Page 145 Wednesday, October 4, 2000 1:43 AM

Page 38: !NNC Yr12 maths ch 04 Century Year 12/04... · spread. For set B, the interquartile range is the better measure of spread as it does not take the outlier score 29 into account. Standard

146 NEW CENTURY MATHS GENERAL: HSC

11. The results for two classes in a Geography test are listed below:Class A: 43, 50, 54, 63, 75, 48, 68, 72, 65, 63, 70, 69, 55, 64, 73, 66, 50, 59, 68, 71, 73, 64Class B: 35, 89, 42, 79, 45, 90, 64, 53, 66, 82, 71, 63, 32, 79, 44, 92, 46, 63(a) Represent the data in a back-to-back stem-and-leaf plot with stems 3, 4, …(b) Use your display to comment on the shape of each class data set by describing

any outliers, clusters, peaks, symmetry or skew.(c) Find the range and interquartile range for each class.(d) Find the standard deviation for each class (correct to 1 decimal place).(e) Which measure would best describe the spread of the two data sets. Why?(f) Which measure of location, the mean or median, would better describe each data

set? Why?

12. This area chart shows the percentage of Australia’s national income saved by households and companies.

(a) What percentage was saved by households in 1980–81?(b) What was the combined percentages saved in 1992–93?(c) In which period was the corporate saving highest?(d) What were the similarities and differences in saving patterns of households and

companies?(e) What trends in savings can be seen from 1962 to 1999?

13. A random sample of people took part in a survey to see who had been to the dentist in the last 6 months.

(a) How many people were surveyed?(b) How many of those surveyed had been to the dentist?(c) What percentage of the 40–44 age group had been to the dentist?(d) What percentage of those surveyed were under 25 and had been to the dentist?(e) What percentage of those surveyed were between 25 and 39?

Age 15–19 20–24 25–29 30–34 35–39 40–44 45–49 50+

Dentist 21 18 54 43 58 43 38 15

No dentist 24 35 51 47 52 49 38 20

% o

f nat

iona

l inc

ome

Savings by households and companies, Australia

1962–63 1968–69 1974–75 1980–81 1986–87 1992–93 1998–99Year

Household saving Corporate saving(or company profits)

20

15

10

5

0

25

!NNC Yr12 maths ch 04 Page 146 Wednesday, October 4, 2000 1:43 AM