Why do statisticians answer silly questions that no one ever asks?

2
february2012 30 © 2012 The Royal Statistical Society Why do statisticians answer silly questions that no one ever asks? Most statistical practice is designed to answers questions that nobody asked and nobody wants answered. What is worse is that civilians – that is, non-statisticians – believe that the answers they receive from statistical analyses are in response to the questions they have asked. These souls thus go away more confident about what they have learned than they have a right to be. A sociologist wants to know if demographic group A is more put upon than group B, control- ling for this or that variable. A political scientist is interested in whether policy A results in more votes for his candidate than policy B. Or a company wants to discover which drug is better, A or B. So they design a study and collect data, and the result for the drug company will be that so many patients improved using A, and so many improved using B. At this point, no one in the world wants to know what the chance is that, for this experimental group, A was better than B – better in the sense that a greater proportion of patients fed A improved. Nobody wants to know this because it is a silly question, trivially answered. We know exactly what the chance is, because the event has already happened. All we have to do is look at the data and see that either A was better or B was. The probability that A was better, given this evidence, is either 1 or 0. The question is exactly as silly as asking what the chance is of a certain horse winning the 2011 Kentucky Derby. The race has already been run. It happened last May. Animal Kingdom won it. Anyone now who looks for a bookie to take his money on any other result would be off his head. What the drug company actually wants to know is, given the evidence from the trial, and possibly given other evidence about the two drugs, what are the chances that a greater pro- portion of future patients will get better if they take A instead of B? They might complicate this question with information about the regulation, cost, supply, or even politics of the drugs. They will ask something that can be observed about patients they have not yet seen. Classical statistical practice does not answer these questions, nor anything like them. Instead, proxy questions, which have at best a vague similarity to the actual questions, are asked. The problem is that civilians do not understand that confidence in answering these curious proxies rarely or never translates into confidence about answers to actual questions. The result is that the world is filled with triumphant declarations of surety regarding this finding or that, all sup- posedly blessed by statistics – and all of that happy surety is unjustified. If one follows frequentist practice, the proxy question is: what is the probability, if the experi- ment were rerun an indefinite number of times, and each time it was rerun a test statistic was calculated – one of many that could be picked – that these repeat trials test statistics would be larger than the one actually seen for this data, all assuming that some probability model is true and without error and assuming the parameters of that model are set equal to 0 (or some other number). As I said at the start, this is not a question that anyone would actually ask, and not just because they could not manage it all in one breath. The answer to it is, of course, the p-value, a probability concept so misaligned with intuition that no civilian can hold it firmly in mind (nor, judging by what finds its way into textbooks, can more than a few statisticians). Hearing only that the p-value dipped below a sanctified, not-to-be- questioned level, and hence that the results are “statistically significant”, the civilian assumes that his original question has been answered and that the probability that A is better is high. But he should not believe this, because of course it might not be true. The p-value can be small, as small as you like, and the probability that A is better could still be low. This is so even if the model is true and the test statistic chosen is the “right” one for the problem at hand. Given that it is unlikely the model is true, and given that the test statistic is often chosen for the convenience of the statistician and not the civilian, it is more than likely that the civilian is even farther from having his question answered than he thought. His confidence will remain unabated, however. The situation is improved when one moves to considering posterior distributions of parameters of models. But not by much. The proxy ques- tion becomes: what is the probability that the parameter representing improvement under A is larger than that representing B, given that the “Other people don’t understand us”, wail statisticians. But do statisticians try to understand what other people want? Statisticians answer silly questions that no one in their right minds would ever ask, says Matt Briggs. Instead they should give useful answers to the questions that people actually do ask. I know that A has won, but I am betting on B – a statistician told me to! controversy

Transcript of Why do statisticians answer silly questions that no one ever asks?

Page 1: Why do statisticians answer silly questions that no one ever asks?

february201230 © 2012 The Royal Statistical Society

W h y d o s t a t i s t i c i a n s a n s w e r s i l l y q u e s t i o n s t h a t n o o n e e v e r a s k s ?

Most statistical practice is designed to answers questions that nobody asked and nobody wants answered. What is worse is that civilians – that is, non-statisticians – believe that the answers they receive from statistical analyses are in response to the questions they have asked. These souls thus go away more confident about what they have learned than they have a right to be.

A sociologist wants to know if demographic group A is more put upon than group B, control-ling for this or that variable. A political scientist is interested in whether policy A results in more votes for his candidate than policy B. Or a company wants to discover which drug is better, A or B. So they design a study and collect data, and the result for the drug company will be that so many patients improved using A, and so many improved using B. At this point, no one in the world wants to know what the chance is that, for this experimental group, A was better than B – better in the sense that a greater proportion of patients fed A improved.

Nobody wants to know this because it is a silly question, trivially answered. We know exactly what the chance is, because the event has already happened. All we have to do is look at the data and see that either A was better or B was. The probability that A was better, given this evidence, is either 1 or 0.

The question is exactly as silly as asking what the chance is of a certain horse winning the 2011 Kentucky Derby. The race has already been run. It happened last May. Animal Kingdom won it. Anyone now who looks for a bookie to take his money on any other result would be off his head.

What the drug company actually wants to know is, given the evidence from the trial, and possibly given other evidence about the two drugs, what are the chances that a greater pro-portion of future patients will get better if they take A instead of B? They might complicate this question with information about the regulation, cost, supply, or even politics of the drugs. They will ask something that can be observed about patients they have not yet seen.

Classical statistical practice does not answer these questions, nor anything like them. Instead, proxy questions, which have at best a vague similarity to the actual questions, are asked. The problem is that civilians do not understand that confidence in answering these curious proxies rarely or never translates into confidence about answers to actual questions. The result is that the world is filled with triumphant declarations of surety regarding this finding or that, all sup-posedly blessed by statistics – and all of that happy surety is unjustified.

If one follows frequentist practice, the proxy question is: what is the probability, if the experi-ment were rerun an indefinite number of times, and each time it was rerun a test statistic was

calculated – one of many that could be picked – that these repeat trials test statistics would be larger than the one actually seen for this data, all assuming that some probability model is true and without error and assuming the parameters of that model are set equal to 0 (or some other number). As I said at the start, this is not a question that anyone would actually ask, and not just because they could not manage it all in one breath.

The answer to it is, of course, the p-value, a probability concept so misaligned with intuition that no civilian can hold it firmly in mind (nor, judging by what finds its way into textbooks, can more than a few statisticians). Hearing only that the p-value dipped below a sanctified, not-to-be-questioned level, and hence that the results are “statistically significant”, the civilian assumes that his original question has been answered and that the probability that A is better is high.

But he should not believe this, because of course it might not be true. The p-value can be small, as small as you like, and the probability that A is better could still be low. This is so even if the model is true and the test statistic chosen is the “right” one for the problem at hand. Given that it is unlikely the model is true, and given that the test statistic is often chosen for the convenience of the statistician and not the civilian, it is more than likely that the civilian is even farther from having his question answered than he thought. His confidence will remain unabated, however.

The situation is improved when one moves to considering posterior distributions of parameters of models. But not by much. The proxy ques-tion becomes: what is the probability that the parameter representing improvement under A is larger than that representing B, given that the

“Other people don’t understand us”, wail statisticians. But do statisticians try to understand what other people

want? Statisticians answer silly questions that no one in their right minds would ever ask, says Matt Briggs. Instead they should give useful answers to the questions that people actually do ask.

I know that A has won, but I am betting on B – a statistician told me to!

controversy

Page 2: Why do statisticians answer silly questions that no one ever asks?

february2012 31

model is true and given the evidence from the experiment? This question is more in line with what the civilian desires, but it is still an eva-sion. What, after all, do some unobserved and unobservable parameters have to do with the civilian’s query?

Just as with p-values, it is again the case that knowledge of the values of parameters still exaggerates the evidence of the chance that one drug is better than another. One can have almost certain knowledge of the value of some parameter or parameters (as in estimation or hypothesis testing), but this does not translate into high probability that one drug is better than another.

That is, even though the posterior difference in parameters is high, it still might be the case that the chance that A is better is low. You can easily convince yourself of this: assume that the uncertainty in future patients improving can be

modelled with binomial distributions, with the parameter for A equalling 0.50001 and that for B equalling 0.5. The probability that the parameter for A is greater than that for B is 1 – that is, it is certain that the parameter for A is larger than that for B – but the probability that more patients will improve using A is barely above 50% (regardless of the number of new patients expected).

Announce the probability that the parameters are different from each other, or different from, say, some number like 0, and the civilian hears what is not necessarily so: that the probability that A is better is as high as the probability that the parameters are different. But this can never be the case. Certainty in differences in parameters never translates into equal certainty about the differences in observables. Unless the civilian is made to understand this, his over-certainty is guaranteed. And we must not forget that we are still assuming that the model is true.

The situation is improved once more, but not solved, by moving to predictive statistics. In the language of Bayes, this is the method of display-ing posterior predictive distributions. The term does not exist natively in the frequentist tongue, though predictive methods are possible there.

Predictive methods eliminate proxy ques-tions: they tell us directly the chance that A is better, which is what the civilian wants to know, and they tell him in a language matched to his intuition. If the civilian wants to learn the chance that twice as many, or three times as many, future patients will get better using A than if they use B, controlling for this or that suite of variables, or if he wants to ask any question about the observable outcomes of new patients, then predictive methods will tell him, and tell him plainly without obfuscatory language.

Once more, and inescapably, even with a predictive method there is uncertainty in the model, and uncertainty in the model never finds its way into the results. So even with predictive statistics a civilian is likely to go away over-confident. But he will be far less over-confident than had he been exposed to the usual methods.

So why aren’t predictive methods used more often? Well, they are; at least in some places, and usually by non-statisticians. Computer programs that engage in facial or fingerprint recognition are good examples of predictive models, as are models which are used in automation of any kind, such as in spam filters.

A civilian wants to know which emails are likely to be spam – a predictive question. He would find it intolerable to be offered a p-value on a day’s collection of emails, whose message would be that the null hypothesis that “all today’s email is spam” has been rejected. And he would not want to hear about the certainty in some mysterious parameters. What he wants is an indication of whether each email is spam or is not. If the model is stated in predictive terms – for example, the probability that genuine mail is misclassified as spam – the civilian could accurately gauge if the model is useful to him.

The big reason why predictive methods are not used is that few know of their existence. They are not taught in standard curricula, they are not mentioned in the textbooks used by the vast majority of statistical practitioners. Predic-tive methods can only be found in a very few graduate courses, tacked on as optional lessons in Bayesian theory. Since all courses cover fre-quentism in detail, and only extend to Bayesian theory if there is time, the space left over for newer philosophies is minimal.

The lessons of predictive methods are not simple, either. The maths involved is non-trivial. This is a problem because academic statisticians are convinced that their subject is a branch of mathematics, and they look askance at teaching a method without teaching the maths behind that method. But a civilian needs little or no maths to understand what “the probability that A is better than B is 80%” means.

Nor does he need to understand calculus to run statistical software – which, unfortunately,

is also lacking for predictive methods. There are no packages that one can download or buy. Statisticians wanting to employ predictive tools are forced to write their own code.

More statisticians should take heed. From the few examples given we see the chief benefit of predictive methods: immediate and ongoing feedback on model performance. If the model stinks, we will know about it in short order; sta-tistical expertise will not be needed to confirm it. That is not so for proxy-question statistical analyses, where oftentimes – especially in areas like sociology, psychology, politics, and so forth – all we get are statements about (highly significant!) p-values, statements which are free of any hint that the model may be faulty, and which produce a false certainty that the results are as sure as the authors thought.

Yet proxy questions often appear to give the right answer, or something like it. People are not keeling over (at least, not often) from drugs tested and analysed using proxy-based methods, for example. This is because the company that tested the two drugs knew, or thought they knew, a lot about A’s chances of besting B before they ran the experiment. They had a great deal of knowledge of chemistry, of biology, of the reactions of other humans and possibly of some rats who ate the drug, and more. Plenty is known about the drugs before the experiment begins – indeed, the experiment is conducted because all this evidence is solidly behind the idea that A will beat B. The proxy questions merely confirm what is already suspected.

But it is predictive methods that most natu-rally marry to prior evidence and allow us to fill the gaps in the prior knowledge. They do this by directly quantifying what we can expect in the future; they do not spend time telling us stories about what has already happened.

This criticism of statistical practice is not limited to drug or marketing trials, or to the speculations of sociologists and political sci-entists. People in these areas are just the most vocal among those who are too sure of them-selves. Predictive methods can be used for any kind of statistical model, or wherever customary procedures are used. They form a natural exten-sion to Bayesian analysis and apply to any area where we have uncertainty about what we have not yet seen.

So here’s to the hope that more predictive methods will be employed, especially in those areas that are looked to by governments seeking to form policy decisions. It is in those areas par-ticularly, as I hope we can all agree, that people are far too sure of themselves.

Matthew Briggs is a statistical consultant, and is Adjunct Professor of Statistical Science at Cornell University.

The p-value is a concept so misaligned with intuition that no

civilian can hold it firmly in mind. Nor can many statisticians