Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics

42
1 Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics Savannah River Chapter of the Health Physics Society Aiken, SC April 15, 2011 Tom LaBone

description

Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics. Tom LaBone. Savannah River Chapter of the Health Physics Society Aiken, SC April 15, 2011. “There are three kinds of lies: lies, damned lies, and statistics.” Mark Twain. - PowerPoint PPT Presentation

Transcript of Lies, Damned Lies, and Health Physics Some Random Comments About Statistics in Health Physics

Page 1: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

1

Lies, Damned Lies, and Health PhysicsSome Random Comments About

Statistics in Health Physics

Savannah River Chapter of the Health Physics Society

Aiken, SCApril 15, 2011

Tom LaBone

Page 2: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

2

“It is easy to lie with statistics.” “It is hard to tell the truth without statistics."

Andrejs Dunkels

“There are three kinds of lies: lies, damned lies, and statistics.”

Mark Twain

Page 3: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

3

Today Informal, mostly apocryphal discussion of

what statistics really is, who practices statistics and how they do it, and why all of this is important to you as a health physicist

Main message of talk A good working knowledge of statistics is essential in any

endeavor where data are collected and analyzed (e.g., health physics)

Everyone in the room should become a statistician (of sorts) No math is used in this presentation and no health

physicists were harmed during its preparation

Page 4: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

4

Health Physics and Statistics Some HP “stat” books I used in school

G. F. Knoll Radiation Detection and Measurement 1st Edition 1979

J. Shapiro Radiation Protection 1nd Edition 1972 H. Cember Introduction to Health Physics 1st Edition

1969 R. D. Evans The Atomic Nucleus 1955 P. R. Bevington Data Reduction and Error Analysis

for the Physical Sciences 1st Edition 1969 Statistics was a tool, a “wrench to turn a nut”

Is that all it is?

Page 5: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

5

“Humans are good, she knew, at discerning subtle patterns that are really there, but equally so at imagining them when they are altogether absent.”

Carl Sagan in Contact

What is Statistics?

Page 6: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

6

Signals and Noise Useful information comes to us in the form

of signals that form distinct patterns The signals are contaminated with varying

degrees of noise, which can make it difficult to see the signal

Page 7: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

7

Seeing Patterns

In our evolutionary history, seeing patterns where none existed may have been less harmful than missing patterns that did existThat noise in the grass – is it

just the wind or is it a lion? So, we as a species got very

good at seeing patterns, even in the absence of a signal

Page 8: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

8

Apophenia

Apophenia is the experience of seeing meaningful patterns or connections in random or meaningless data

What do you see below?

Page 9: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

9

Viking 1 Orbiter Mars Global Surveyor

Face on Mars

Page 10: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

10

Face in Food, et cetera

Page 11: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

11

Face in Data

Page 12: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

12

Statistics is … … a science that helps us to differentiate signal

from noise and make decisions with a known probability of being wrong

… a very practical, decision oriented methodology developed to tame our natural tendency to be Apopheniacs

… based on the idea that variability and noise are natural and unavoidable

… a relatively modern science that is actively evolving especially since cheap, powerful computers became

available

Page 13: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

13

Really, What is Statistics?

Chris ChatfieldProblem Solving: A Statistician’s Guide

“Statistics is concerned with collecting, analyzing, and interpreting data in the best possible way, where the meaning of “best” depends on the particular circumstances of the practical situation”

Page 14: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

14

Exploratory Data Analysis

Look at data (usually with graphics) and use our ability to see patterns in the data to Suggest hypotheses to test Assess validity of assumptions on which statistical

inference will be based Support the selection of appropriate inferential tests Suggest ideas for further data collection

Page 15: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

15

Fecal SamplesAir Filters

Pu239

0 1 2 3

12

34

5

6

78

910

11

12

13141516

17181920

2122

23

24

2526

2728

29

30

31323334

35363738 3940414243 44

45

46

47

48

4950

515253

54

555657

58

59

606162

6364

6566 67

68697071

7273747576 777879

80 81828384

85

8687 8889

9091

92939495969798 99100101 102103104105106107108109110

111112113

114 115116117118

119120 121122123124125 126127

128

129130

131132133

134

135

136

137138

139

140141

142

143

144

145

146

147

148

149 150151

152153

154

155

156

157

158

159

160161

162

163

164

165166

167

168169170171172

173

174

175

176

177

178

179180

181

182

183

184185

186

187 188189190191192

193194

195196 197198

199200

201

202

203

204

205206207208209

210

211

212

213214

215

216217

218219220

221222223224225

226227

228

229

230

231

232233234

235236

237238239240

241242

243

244245246 247248 249250251252

253254255256257

258259260261

262

263264

265266267268

269270271272273274275276277278279280281 282

283284285286

287

288289

290291292

293

294295296297

298

299300

301

302303304305306307308

309

310311312313314315316317

318

319

320

321

322

323324325

326327328329

330331

332

333334335336

337

338

339

340

341342343344345346347348349350351352353354

355356

357358359360361362363364365366367368

369370

371372373374375376377378379380381382383384385

386387388

389390391392393394395396397 398399400401402

403

404405 406407408409

410411412 413414415 416

417

418419420421422423424425426

427428429430431

432433434435436

437

438439

440441442443444445

446447448

449

450

451

452

453

454

455456457

458459460461

462

463464465

466

467

468469

470

471

472473474

475

476

477

478

479

480 481

482483484

485

486

487

488

489490

491492

493494495

496

497498

499

500

501

502

503

504

505506507508

509

510

511

512

513

514515

516

517

518

519

520

02

46

810

12

12

34

5

6

789

10

11

12

13141516

17181920

2122

23

24

252627

28

29

30

31323334

35363738 3940

414243 44

45

46

47

48

4950

51525354

555657

58

59

606162

6364

656667

68697071

7273747576777879

8081828384

85

86878889

9091

9293949596979899100101102103104105106107108109110

111112113

114 115116117118

119120121122123124125126127

128

129130

131132133

134

135

136

137138

139

140141

142

143

144

145

146

147

148

149150151

152153

154

155

156

157

158

159

160161

162

163

164

165166

167

168169170

171172

173

174

175

176

177

178

179180

181

182

183

184185

186

187 188189190

191192

193194

195196197198

199200

201

202

203

204

205206207208209

210

211

212

213214

215

216217

218219220

221222223224225

226227

228

229

230

231

232233234

235236

237238239

240

241242

243

244245246247248249250251252253254

255256257258259260

261

262

263264

265266267268

269270271272273274275276277278279280281 282283284285286

287

288289

290291292

293

294295296297

298

299300

301

302303304305306307308

309

310311312313314315316317318

319

320

321

322

323324325

326327328329

330331

332

333334335336

337

338

339

340

341342343344345346347348349350351352353354

355356

357358359360361362363364365366367368369

370371

372373374375376377378379380381382383384385

386387388

389390391392393394395396397398399400401402

403

404405406407408409

410411412413414415416

417

418419420421422423424425426427428429430431

432433434435436

437

438439

440441442443444445446

447448

449

450

451

452

453

454

455456457

458459460461

462

463464465

466

467

468469470

471

472473474

475

476

477

478

479

480481

482483484

485

486

487

488

489490

491492

493494495

496

497498

499

500

501

502

503

504

505506507508

509

510

511

512

513

514515

516

517

518

519

520

01

23 Slope = 0.236

Cm244

12

3

45

6

7

8

9

10

1112131415

16

17181920

21

2223

24

25

26

27

28

2930313233

34

35363738

39

404142 43

44

45

46

47 48

49

505152535455

565758

59

606162

636465

66

67686970

71

7273747576

7778

7980

81828384

85

86

87

8889

90

91

9293949596

9798

99100101

102

103104105106107108

109

110111112113114

115

116117

118119120

121

122123124

125

126

127

128129130

131

132133

134

135

136

137

138

139140141

142

143

144

145

146147

148149

150

151

152153

154

155

156

157

158

159

160

161

162

163

164

165166

167

168

169

170

171172

173

174

175

176

177

178

179

180

181

182

183

184185

186

187

188

189

190

191192

193

194

195196

197198199200

201

202

203

204

205206207

208209

210

211

212

213

214

215

216 217

218219

220221222223224225

226

227

228

229

230

231

232233

234

235236

237

238239240

241242

243

244245246

247

248

249

250251252253254255256257258259260261

262263

264

265266

267268269

270271272273

274275276277278279280281

282

283284285286

287

288289290

291292

293

294

295296297

298

299300

301

302303304305306

307308309310311312313

314315316317318

319320321 322

323324325326

327328

329330

331332

333334335336

337

338

339340341342343

344345346347

348

349350351352353354

355

356357358359360361362363364365366

367368

369370

371

372373374375376377378379380381382383384385

386

387388

389390391392393

394

395396397

398

399400401402

403

404405

406407408409

410411

412

413414415

416417

418419420421422423424425426427428

429430431

432433434435436

437

438439440441442443

444445

446

447448

449

450

451

452

453

454

455

456

457

458459460461

462

463464465

466

467

468469

470

471

472473474

475

476

477

478

479

480

481

482483

484

485

486

487

488489490 491

492

493494495

496

497498

499

500

501

502

503

504

505506507

508

509

510

511

512

513514515

516

517

518

519520

0 2 4 6 8 10 12

Slope = 1.38 Slope = 4.56

0 5 10 15

05

1015

Am241

Fecals as of 3/5/2011

Pu239

2 4 6 8 10

1

23

4

5

6

510

1520

2530

1

23

4

5

6

24

68

10

Slope = 0.316

Cm244

1

2

3

4

5

6

5 10 15 20 25 30

Slope = 2.02 Slope = 6.09

10 20 30 40 50 60

1020

3040

5060

Am241

Kinectrics Filters All

Page 16: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

16

Confirmatory Data Analysis

Use statistical tests to answer questions about the data along with the risks of reaching the wrong conclusion Is the material on the filters the same material

that is in the fecal samples?Are the Pu-239 to Am-241 ratios in the fecal

samples and air samples the same once we account for random noise?

Page 17: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

17

0 2 4 6 8 10 12

05

10

15

Pu-239 (mBq)

Am

-24

1 (

mB

q)

95% CI = (1.33, 1.46)

2

Fecal Samples

Page 18: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

18

Data Dredging

Are the two Pu-239 to Am-241 ratios the same? If this question was asked before we saw the

data we can proceed with the test to answer it If this question was inspired by the data then we

should not test the same data to get the answer Referred to as data snooping, data dredging, etc. Cancer clusters

Page 19: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

19

Statistical Method

Define the problem Formulate your questions in such a way that

unambiguous answers are possible

Collect data Collect data capable of answering your question

Analyze the data Present the results

in terms your audience can understand

Page 20: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

20

"It is better to solve the right problem the wrong way than to solve the wrong problem the right way".

Richard Hamming

“An approximate answer to the right problem is worth a good deal more than an exact answer to an approximate problem.”

John Tukey

Define the Problem

Page 21: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

21

Data Collection

Collect data that are capable of answering the question asked (Data Quality Objectives)Designed experimentsObservational studies

SamplingYou select samples from a population in order

to make inferences about the population

Page 22: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

22

GIGO The collection of data is often the most time-

consuming and expensive part of a study Reverend Bayes and all of his horses can’t fix a

bum dataset

Page 23: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

23

Analyze the Data All statistical procedures have assumptions In practice, the assumptions of any given

statistical procedure are violated to some degree Can the validity of the assumptions be verified? Can the validity of the answer be verified?

How robust is your statistical procedure to violations of its assumptions?

Simple approximate solutions you can understand may be better than complex exact solutions that you can’t

Augment standard statistical analyses with simulations

Page 24: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

24

Present Results Technical answer versus the functional

answer“the null hypothesis is not rejected” technically “not rejected” “accepted” functionally “not rejected” =“accepted”

Statistical significance and practical significanceApply “so what” test to your answers

Page 25: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

25

What is a Statistician?

“Powerful spirits should only be called by the master himself”

GoetheThe Sorcerer's Apprentice

Page 26: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

26

What is a Statistician? Based on Chatfield’s definition of statistics, anyone who

makes decisions based on the analysis of data might be called a statistician

However, the title statistician is usually reserved for a professional who has specialized training in the concepts, theoretical bases, and methodologies of statistics

Key difference between the sorcerer and his apprentice Contrary to what you might think, there is a lot of subjectivity and

professional judgment in the practice of statistics Statistics is vast in scope and detail, and the apprentice does not

know what he does not know

“It ain't what you don't know that gets you into trouble. It's what you know for sure that just ain't so.”

Mark Twain

Page 27: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

27

The Sorcerer’s Apprentice

We may not be statisticians, but we are clearly doing statistics, often without adult supervision

Doing our own statistics is a good thing, but we need to become better students of the black arts and consult the master before the brooms get out of control

“Should I refuse a good dinner simply because I do not understand the processes of digestion?”

Oliver Heaviside

[On being criticized for using formal mathematical manipulations without understanding how they worked]

Page 28: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

28

How We Can be Better Statisticians

Master the basics Learn the language Play with your data Use better software Perform reproducible work Consult with a real statistician

Page 29: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

29

Master the Basics

Kahn Academyhttp://www.khanacademy.org/

Page 30: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

30

Statistics MS/Certificate Distance Programs

University of South Carolina Colorado State University Texas A&M University Penn State University

Page 31: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

31

Concepts and Terminology Specialized Concepts

Population versus sample for example Statistics has a very precise language all its own

“the null hypothesis is not rejected” “not rejected” “accepted”

Questions and answers are not right unless you use the proper language to convey the proper concept some statisticians can be intolerant of laymen who misuse the

language of statistics Learn to phrase questions and interpret answers

properly

Page 32: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

32

Exploratory Statistics

Learn to play with your data and see if it is trying to tell you something new

Study graphs of your data

“There is no data that can be displayed in a pie chart, that cannot be displayed BETTER in some other type of chart.”

John Tukey

Page 33: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

33

Software used for Statistics

I use the following software for statistical calculations (in order of usage)RMinitabSASSpreadsheet (e.g., MS Excel, Gnumeric)

There are many others

Page 34: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

34

Spreadsheets (Excel)

What some people can do in Excel is nothing short of amazing (but should they be doing it?) Amarillo Slim beat tennis champ Bobby Riggs at Ping-

Pong, using a frying pan instead of a paddle Spreadsheet Addiction by Patrick Burns

http://lib.stat.cmu.edu/S/Spoetry/Tutor/spreadsheet_addiction.html

Problems with spreadsheet implementation Excel has a long history of doing bad stats

Problems with spreadsheet paradigm Reproducible science

Page 35: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

35

9/28/2007

http://www.msnbc.msn.com/id/21033161/from/RS.1/

M. G. Almiron et al. On the Numerical Accuracy of Spreadsheets, Journal of Statistical Software (34) 4, 2010

Page 36: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

36

Reproducible Research Reproducible research refers to the idea that the ultimate

product of research is the paper along with the full computational environment used to produce the results in the paper such as the code, data, etc. necessary for reproduction of the results

Raw DataData

MassagingCalculations

Plots andTables

FinalPaper

Page 37: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

37

The R Project forStatistical Computing

R is a language and environment for statistical computing and graphics

R is available as Free Software under the terms of the GNU General Public License in source code form

It compiles and runs on a wide variety of UNIX platforms and similar systems (including FreeBSD and Linux), Windows and MacOS

Download from http://www.r-project.org/

Page 38: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

38

Advantages of R

Command line interface rather than a GUI Promotes reproducible statistics

Open source Flexible licensing Availability of source code for peer review Bugs are public knowledge and are fixed quickly New tests and methods tend to appear first in R

Many dozens of recently published books devoted to R

Free (and very good) community support available

Page 39: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

39

Consult with a Statistician

If you are going to involve a statistician, do it at the study design and data collection phases If not, at least estimate how much it will cost

to collect the data all over again Anybody can analyze compelling data

“To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem examination: he may be able to say what the experiment died of.”

Sir Ronald Fisher

Page 40: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

40

Twisted Answers to Crooked Questions

As health physicists there are times when a decision will be made, with or without good data and a proper statistical analysis

In such situations we base our decisions on professional judgment, often augmented with “statistics” We must not fool ourselves about what we are doing

… of all the wrong answers we have to choose from, this one is the best

We have no right to expect a statistician to endorse such mischief

Page 41: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

41

The Apprentice Should Beware of …

The Management Prior Being bamboozled by other people’s

statistics “The only right way to do this is X [insert

statistical method here]” Being seduced by complexity

Page 42: Lies, Damned Lies,  and Health Physics Some Random Comments About Statistics in Health Physics

42

Statistics in the Workplace:Musings of a Sorcerer's Apprentice

Presentation to USC Stat ClubMarch 26, 2009

Main message A degree in statistics is a “Swiss Army Knife” that is

very useful in any endeavor where data are collected and analyzed

Everyone in the room should become a health physicist (I had no takers)