How NOT to Lie with Statistics

Click here to load reader

download How NOT to Lie with Statistics

of 64

description

Workshop Presentation How NOT to Lie with Statistics July 16, 2009 Indianapolis, IN

Transcript of How NOT to Lie with Statistics

  • 1. Suzi Shapiro July16, 2009
    How NOT to Lie with Statistics

2. We will discuss:
1.How to calculate basic statistics with a spreadsheet (Mean, Median, Mode, Standard Deviation, Correlation Coefficient)
2.How to interpret basic statistics
3.How to prepare graphs that do not distort the information
3. What do you need to know to deal with the most common statistics?
4. What do you need to know?
Describe the data
What are the limitations of the statistic
How can the data be clearly represented
5. Where do Statistics come from?
Measure Something!
Nominal Give it a name or category and count it
Ordinal Put it in some type of order and decide which comes first
Interval How long does it take to get from one thing to the next in line
Ratio Can the distance or time be zero? Are all of the intervals the same size?
6. Nominal Give it a name or category and count it
Names of colors including only:
Red
Orange
Yellow
Green
Blue
Purple
Other
7. Ordinal Put it in some type of order and decide which comes first
Which of the following best represents your opinion of online surveys?
Online surveys are a waste of time.
Strongly Disagree
Disagree
Agree
Strongly Agree
8. Interval How long does it take to get from one thing to the next in line
9. Ratio Can the distance or time be zero? Are all of the intervals the same size?
What is your current age (in years)?
0, 1, 2, 3, 4, n, 110, 111, 112
10. What are its limitations?
Nominal Can only Count
Ordinal Can only Rank
Interval Can calculate, but may be deceiving
Ratio Dependent on probability and sample size
11. What about Probability?
Am I sure that my measurement is correct?
How often do I think this event will happen?
12. Lottery Logic
When I flip a coin there are two options.If I pick Heads50% of the time I win, and 50% I lose.
Buying a lottery ticket, there are two options, win and lose
Therefore I should win 50% of the time
13. Describing Data
Normal Distributions
Measuring somethingthat is influenced by many factorswill often give datathat falls in a normal distribution
14. Statistics Describe Data
Normal Distributions
Describe thingsthat can be measured where very few things are at the very top or bottom of the measurement
15. Mens Heights
Short
Tall
16. Mens Heights
Short
Tall
17. Most statistics
Assume you are dealing with something that when measured will be distributed normally
You have measured enough things for random error become small
18. Randomness
Many events are random or unpredictable because they are determined by too many events and outside factors.
Human beings try to create predictability out of randomness
19. Calculating Statistics
Statistical Analysis Program
On paper or with a calculator
In a spreadsheet
20. Spreadsheets
Rows and Columns
Formulas
Graphs
Excel
21. Nominal Data
Frequencies, Mode, & Percentages
22. What is your favorite color?
Green
Blue
Green
Blue
Earth tones
Orange
Purple
Green
Blue
Fushia
23. Favorite Color
24. What is your favorite color?
Chrome
Purple
Orange
Blue, no green (aaaaaaaaahhhhhhhh)
Red
Blue
Green
Green
Yellow.
Purple
25. Favorite Color
26. What is your favorite color?
Green
Blue
Blue
Blue
Orange
Green
Sage green
Red
Red
Red
27. Favorite Color
28. What is your favorite color?
Red
Blue
Purple
29. Favorite Color
30. What is your favorite color?SUMMARY
How would you describe the responses?
Do you see any problems in this data?
31. Favorite Color
32. Percentages
Red 5 15%
Orange 3 9%
Yellow 1 3%
Green 8 24%
Blue 9 27%
Purple 4 12%
Black
White
Other 3 9%
33. Report
When asked to reply to the question What is your favorite color?, Blue and Green were the most common answers.
These two colors accounted for more than half of the responses (61%).
The remaining responses suggest more people report red and purple that yellow and orange.
34. Report
Answers to this question should not be taken as an indication of the color that would be chosen in other contexts.
Since this sample only included 33 responses, a larger sample should be used to determine if this pattern would hold in other samples or cultures.
35. Favorite Colors
36. Data from another survey
The study includes results from 232 people from 22 countries.
The mean age of this group is 30.34 with the youngest being 15 and the eldest being 81.
http://www.joehallock.com/edu/COM498/index.html
37. Favorite Colors
38. http://www.visualsymbols.com/webgraphics/2004_Global_Color_Report1a.pdf
Table Versus Graph?
39. Default Graph in Excel
40. Graph in Spectral Order
41. Color Coded Graph by Country
42. What can you conclude?
43. Ordinal Data
Frequencies, Mode, Percentages, & Rank
44. Rank Statistics
Likert Scales
Strongly Agree
Agree
Disagree
Strongly Disagree
45. Likert Data
46. Interval data
Frequencies, Mode, Percentages, & Rank
47. Likert Data
Means
(Pretending we have better data)
48. Ratio Data
Frequencies, Mode, Percentages,Rank, Mean, Median, Standard Deviation
49. Only Ratio Data
Can be technically be used in Calculations
Can be used for Inferential Statistics
50. Calculating Statistics
Frequencies
Percentages
Minimum
Maximum
Range
Mode

  • Median

Mean
Standard Deviation
51. Formulas
=MIN(B2:B20)
=MAX(B2:B20)
Maximim-Minimum=RANGE
=MODE(B2:B20)
=AVERAGE(B2:B20)
=MEDIAN(B2:B20)
=STDEV(B2:B20)
52. Mean, Median, & Mode
Tell you about the middle of the data
If the data is normally distributed, all will be in the center, half way between the minimum and maximum
53. Individual Age Data
54. Age Data as a Trend Line
55. Survey Age Data
Minimum 24.0
Maximum 65.0
Range 41.0
Mean 44.9
Median 44.0
Mode 53.0
Standard Deviation 10.9
56. Grouped Age Data
57. Report
Most respondents are middle aged with about 70 percent being between the ages of 34 and 46.
The youngest was 24 and the oldest 65
58. When Means go Bad
If there are a lot of extremely high or low scores, the mean may not be in the middle.
Common for data such as:
income, age, prices of products, etc.
Use the MEDIAN or Middle score.
59. Standard Deviation
One Standard Deviation from the Mean
How low or high do you need to go to include about 68% of the things you are measuring.
About 99% of things are usually within 3 Standard Deviationsfrom the Mean.
60. BONUS Statistic
The Correlation Coefficient
Looks at the degree of relationship between two sets of measurements
Can be used to predict the value of one measurement if the other is known
Does not tell you if the two measurements are CAUSALLY related.
61. Spreadsheet Formula
You need two sets of Ratio Data
=CORREL(B9:B40,C9:C40)
The correlation between age and percent of grey hair in an imaginary sample is a .63.
Positive means more grey hair as people get older
Less than one means that some people may get grey faster or slower.
62. Scatter Plot
63. Resources on Statistics
64. Good Books on Statistics
The Drunkards Path
Fooled by Randomness
Statistics Hacks
http://astore.amazon.com/suzishapiroco-20