Post on 13-Dec-2015
Statistics 3
Continuous data
• Collecting continuous data (measured data, few values the same.)
• We collect data using tally charts when we want to group data into categories
• or stem and leaf plots when we want to have a record of each data value.
Oyster length data tally chartLength Tally Frequency 150 - 160 l 1 160 - 170 0 170 - 180 l 1 180 - 190 0 190 - 200 llll ll 7 200 - 210 llll llll 10 210 - 220 l 1 220 - 230 llll l 6 230 - 240 lll 3 240 - 250 0 250 - 260 0 260 - 270 0 270 - 280 0 280 - 290 0 290 - 300 0 300 - 310 0 310 - 320 l 1
150 - 160 means all data that is 150 mm up to 160 mm (but not including 160 mm)
The best graph to show this data is a histogram.
Histogram
Histogram: Lengths of Oyster
0
2
4
6
8
10
12
150 -160
160 -170
170 -180
180 -190
190 -200
200 -210
210 -220
220 -230
230 -240
240 -250
250 -260
260 -270
270 -280
280 -290
290 -300
300 -310
310 -320
Length (mm)
Frequency
Histogram
Histogram: Lengths of Oyster
0
2
4
6
8
10
12
150 -160
160 -170
170 -180
180 -190
190 -200
200 -210
210 -220
220 -230
230 -240
240 -250
250 -260
260 -270
270 -280
280 -290
290 -300
300 -310
310 -320
Length (mm)
Frequency
• Note that we always put units on the axes.
• There are no gaps between bars as the data is continuous.
Histogram
Histogram: Lengths of Oyster
0
2
4
6
8
10
12
150 -160
160 -170
170 -180
180 -190
190 -200
200 -210
210 -220
220 -230
230 -240
240 -250
250 -260
260 -270
270 -280
280 -290
290 -300
300 -310
310 -320
Length (mm)
Frequency
• The graph shows us that we have some outliers.
• Most of the lengths are between 190 and 240 mm.
• The modal interval is 200mm to 210 mm.
Stem and Leaf Plot15 91617 01819 0 3 4 5 7 7 820 1 3 3 3 5 6 7 7 8 821 722 2 4 7 8 8 923 0 3 42425262728293031 8
It looks a lot like the histogram just on its side.
Stem and Leaf Plot15 91617 01819 0 3 4 5 7 7 820 1 3 3 3 5 6 7 7 8 821 722 2 4 7 8 8 923 0 3 42425262728293031 8
We can get the same information from this plot as we can from the histogram.
Line graphs
• Line graphs are used to show some kind of change or distinction between data.
Example
• We could look at the pre-test and post-test scores of a class to gauge if there has been a change.
Pre-test and post-test results
0
1
2
3
4
5
6
7
8
9
10
0 2 4 6 8 10 12
Marks
Frequency
Pre-testPost-test
Pre-test and post-test results
0
1
2
3
4
5
6
7
8
9
10
0 2 4 6 8 10 12
Marks
Frequency
Pre-testPost-test
• This gives us a picture of what happened.
• Another way to show this would have been to draw a box and whisker plot.
Note that the LQ and Median were the same value
Pre-test and post-test results
0 2 4 6 8 10 12
Marks
Pre-testPost-test
Note that the LQ and Median were the same value
Pre-test and post-test results
0 2 4 6 8 10 12
Marks
Pre-testPost-test
• The box and whisker plots give us a clear indication that there has been a change but they don’t show us that there were 25 marks in the first test and only 17 recorded in the second and hence we cannot be certain of the conclusions that we make.
Paying our bills
• Media often talk about people receiving a percentage increase in their wages.
• Is this a fair way to look at things when everyone pays the same amount for goods?
Example
• Mr Jones gets $381 a week. He spends $108 on food for his family of four.
• Mrs Smith gets $684 a week. She spends $146 on food for her family of four.
• The best way to show this situation is to draw 2 pie graphs.
Comparing
Mr Jones
Rest72%
Food28%
Mrs Smith
Rest79%
Food21%
Although Mrs Smith spends more on food, the percentage of her income is less.
Mr Jones
Rest72%
Food28%
Mrs Smith
Rest79%
Food21%
Misleading Graphs
Company profits
1000
1050
1100
1150
1200
1250
1300
1350
1400
2000 2001 2002 2003
Year
Profit ($000s)
Misleading Graphs
Company profits
1000
1050
1100
1150
1200
1250
1300
1350
1400
2000 2001 2002 2003
Year
Profit ($000s)
• The impression we get is that this company is improving quite dramatically.
• Notice where the y-axis begins
Misleading Graphs
Company profits
1000
1050
1100
1150
1200
1250
1300
1350
1400
2000 2001 2002 2003
Year
Profit ($000s)
Misleading Graphs
Company profits
0
200
400
600
800
1000
1200
1400
1600
2000 2001 2002 2003
Year
Profit ($000s)
Misleading Graphs
Company profits
0
200
400
600
800
1000
1200
1400
1600
2000 2001 2002 2003
Year
Profit ($000s)
• Using ‘0’ as a starter value gives us a more honest representation of the data and we now see that the increase is less dramatic.
School Exam Passes
0
20
40
60
80
100
120
A B C D
School
Number of passes
Misleading graphs
School Exam Passes
0
20
40
60
80
100
120
A B C D
School
Number of passes
• The principal of school B says her school is the most successful.
• Do you agree?
Misleading graphs
School Exam Passes
0
20
40
60
80
100
120
A B C D
School
Number of passes
• We need more information to make a judgement.
School data
School Number who sat
Number who
passed
Percentage who passed
A 60 40 67%
B 257 100 39%
C 75 60 80%
D 180 70 39%
Comparing graphs
School Exam Passes
0
20
40
60
80
100
120
A B C D
School
Number of passes
Percentage who Passed
0
10
20
30
40
50
60
70
80
90
A B C D
School
Percentage who passed
We now think that C is the best school because the pass rate is better.
You now find out that school C will not let weak students sit the exam as it spoils their
percentage.
School Exam Passes
0
20
40
60
80
100
120
A B C D
School
Number of passes
Percentage who Passed
0
10
20
30
40
50
60
70
80
90
A B C D
School
Percentage who passed
We now think that A is the best school as the results for school C don’t represent the situation correctly.
Exam results
Exam results George Smythe
English M Mathematics E Economics N Geography A French A
• From George’s results we know his best subject is mathematics and his worst subject is Economics.
• True or False?
Comment on features that are misleading.
Sales of shoes Jan-Mar 1990
100
102
104
106
108
110
3 4 5 5.5 6 6.5 7 7.5 8 9 10
Shoe Size
No. of pairs sold
Comment on features that are misleading.
Sales of shoes Jan-Mar 1990
100
102
104
106
108
110
3 4 5 5.5 6 6.5 7 7.5 8 9 10
Shoe Size
No. of pairs sold
• Inappropriate graph. It does not show the data correctly. Data is discrete and yet this shows data as continuous.
• Should have been a bar graph.
Comment on features that are misleading.
Sales of shoes Jan-Mar 1990
100
102
104
106
108
110
3 4 5 5.5 6 6.5 7 7.5 8 9 10
Shoe Size
No. of pairs sold
• Scale on y-axis is not uniform.
• Suppression of ‘0’. I.e. the graph does not start at zero.
• Non-uniform scale on the x-axis.
Comment on the features that are misleading.
Use of poisoned bait
0
2
4
6
8
10
12
1985 1986 1987 1988 1989 1990
Year
Amount used
Comment on the features that are misleading.
Use of poisoned bait
0
2
4
6
8
10
12
1985 1986 1987 1988 1989 1990
Year
Amount used
• No units on the vertical axis.
• As the height increases so does the width (and thickness) and visual impact of area (and volume) gives a distorted impression.
• Not clear where carrots should be read from.
Misleading data
Average weekly wages
Men’s suits
1971 $64 1971 $62 1982 $280 1982 $230
(40 hours work)
(Average price)
Misleading data
Average weekly wages
Men’s suits
1971 $64 1971 $62 1982 $280 1982 $230
(40 hours work)
(Average price)
The Minister of Consumer Affairs says that suits are really cheaper in 1982 than 1971. Explain his reasoning.