Teaching Students How (Not) to Lie with Statistics

Post on 27-Jan-2017

555 views 0 download

Transcript of Teaching Students How (Not) to Lie with Statistics

Teaching Students How (Not) to Lie with Statistics

Lynette HoelterAmerican Sociological Association

August 23, 2015

Presentation Outline:• Statistics as social construction• Questioning evidence• Practice, practice, practice• Ways stats can “catch” us• Sources of “numbers” for practice

Numbers lend “authority”• Make arguments seem more “scientific”• Appears definitive

but, sometimes…• Sources are given more credibility than they

should be (e.g., “Univ. of Michigan data suggest” referring to results from a study of UM students)

• Key information needed to evaluate is missing and/or numbers are taken out of context

Numbers as social construction• Evidence is evidence, right? • Numbers/statistics do not exist apart from

people– Who counted?– What exactly did they count?– Why did they count it?

• Quantitative literacy is first step, then add sociology (or vice versa)

Questions to ask upon sighting data1

• What is the source of the statement and/or data?

• How is the information reported?• Is the sample of adequate size and

representative?

1 Adapted from Healey, Joseph E., 2013. The Essentials of Statistics: A Tool for Social Research (3rd Ed). Belmont, CA: Wadsworth, Cengage Learning.

We ALL need practice• Using data in (any) class:

– Start class with data– Tie survey data to topic of lecture– Use real data as examples for problems or

exams– Require evidence-based arguments

Easy Example:EXTRA CREDIT: The charts below were part of a blog post by the Federal Reserve Bank of New York (9/2/2014) and demonstrate two ways of looking at the value of a college degree. Net Present Value represents the additional income earned by someone with a Bachelor’s degree compared to someone without, added over a 40+ year working life. In a couple of sentences, describe the trends in each chart and then answer the question: Is a college degree worth it? Why or why not? (5 points)

Ways stats can “catch” us• Definition issues• Big numbers• Proper measure of

central tendency• Percent/percent

change• Risks/Rates• Correlation & causation

• Trends over time• Statistical vs

substantive significance• Funky graphics• Reducing complexity of

social patterns

Definition Issues• What was included, what was excluded? • How was a “positive” defined?• If looking at cost/benefits – really measuring

all costs/benefits? (Compare apples to apples)• From whom were data collected (sampling)?

Source: http://mediamatters.org/research/2012/10/01/a-history-of-dishonest-fox-charts/190225

Definitions (con’t)• Rates = fairly straightforward; • US Divorce Rate – commonly reported ~ 50%• Numerator is easy (formal divorces?)• Denominator??

– All current marriages– All first marriages– All marriages in one year

• Large differences by age at first marriage, number of previous marriages, etc.

Definition of credit card fraud given on site: Credit card fraud is a theft committed using a credit card or debit card, as a fraudulent source of funds in a transaction. The purpose may be to obtain goods without paying, or to obtain unauthorized funds from an account. According to the United States Federal Trade Commission, while identity theft had been holding steady for the last few years, it saw a 21 % increase in 2008.

No hint as to whether denominator includes all Americans, Americans with credit cards, etc.Source: www.statisticbrain.com/credit-card-fraud-statistics/

Big Numbers• Shock value• No context• More memorable

– Deaths from flu 1976-2006 range from 3,000 to 49,000

– 49,000 is a lot, isn’t it?!– 1,715,434 deaths in US in 2015 so far

Providing Context for Big Numbers• Using seconds1:

– One million seconds ~ 11.6 days (86400 = day) – One billion seconds ~ 31.5 years

• Using $$: $17 Trillion US Debt• Population sizes2:

– 100,000 people ~ South Bend, IN– 1,000,000 people ~ San Jose, CA or Austin, TX; Montana or Rhode

Island– 10,000,000 people ~ North Carolina or Georgia– US. Pop. = 320,145,187 (320 million)– China Pop. = 1,393,783,836 (1.39 billion)– World Pop. = 7,361,779,045 (7.36 billion)1 Paulos, 2001 2US Census and Worldometers.com

Central Tendency• Plays on our understanding of “average”• Distributions that are skewed should use

median– E.g., “Average” household income in US, 2011

• Median: $50,502• Mean: $69,821

Percent/Percent Change• Beware of percentages in tables

– Make sure they add to 100% for the independent variable

• Percent change– Each calculation changes the base– Why 50% Off sales are not the same as 20% off

and additional 30% off

Risks & Rates Risk of developing breast cancer in next 10 years goes up by 230% from age 30 to 40; 58% from age 40-50.

From: http://www.cdc.gov/cancer/breast/statistics/age.htm

Correlation vs. Causation

• From: Spurious Correlations

Trends (or “Trends”) over Time• Legends of charts• Time frame presented

can change interpretation

• Changes in defining/reporting

• Be wary of trends that suddenly change direction (life doesn’t move that quickly)

Incidents were classified as school shootings when a firearm was discharged inside a school building or on school or campus grounds, as documented by the press or confirmed through further inquiries with law enforcement. Incidents in which guns were brought into schools but not fired, or were fired off school grounds after having been possessed in schools, were not included.

“Funky” Graphics

All examples from http://flowingdata.com/category/statistics/mistaken-data/

Simplifying Complex Processes• Identifying one event/process/change as

affecting change in complex process– E.g., “Broken Window” theory of crime

In Short:

• Get students thinking about numbers and their context as early and often as possible

Websites to Start Your Search• ABCNews Who’s Counting (Paulos’ column)• Association of Religion Data Archives

Learning Center• Choosing a Good Chart (decision table)• Data360• Gapminder• ICPSR: Resources for Instructors

– Data-driven Learning Guides • Pew Research Center: Fact Tank, Reports,

Datasets, Interactives• Population Pyramids of the World • Social Explorer: US mapping• Social Science Data Analysis Network • Spurious Correlations• Statistic Brain• Stats.org• Survival Curve• TeachingWithData.org• Worldometers, USA Live Stats

• Public Opinion: – Gallup Organization – National Opinion Research Center (GSS

Explorer)– Roper Center (iPoll)

• Government Centers such as the Census (American FactFinder), NCES, or NCHS

• Professional Development: – Science Education Resource Center

(Carleton College)– TeachQR.org (Lehman College)– Making Data Meaningful (United Nations

Economic Commission for Europe)• International:

– UK Data Services Teaching with Data– European Social Survey EduNet

(A Few) Interesting Reads:Best, Joel. 2012. Damned Lies and Statistics: Untangling Numbers from the Media, Politicians, and Activists (2nd Ed). Berkeley: University of California Press.Best, Joel. 2004. More Damned Lies and Statistics: How Numbers Confuse Public Issues. Berkeley: University of California Press.Huff, Darrell. 1993. How to Lie With Statistics (2nd Ed). New York: W.W. Norton & Company.Klass, Gary. 2012. Just Plain Data Analysis: Finding, Presenting, and Interpreting Social Science Data (2nd Ed). New York: Rowman & Littlefield Publishers, Inc.Paulos, John Allen. 2001. Innumeracy: Mathematical Illiteracy and Its Consequences (2nd Ed). New York: Hill & Wang.Silver, Nate. 2012. The Signal and the Noise: Why So Many Predictions Fail – But Some Don’t. New York: Penguin Group (USA).

Questions? Comments? Suggestions?

Lynette Hoelter: lhoelter@umich.edu