Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York,...

44
37 Asian Journal of Economics and Banking (2019), 03(01), 37–80 Asian Journal of Economics and Banking ISSN 2588-1396 http://ajeb.buh.edu.vn/Home Reality-Based Probability & Statistics: Solving the Eviden- tial Crisis William M. Briggs Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available online: In Press Keywords Artificial intelligence, Cause, Decision making, Evidence, Fal- sification, Machine learning, Model selection, Philosophy of probability, Prediction, Proper scores, P-values, Skill scores, Tests of stationarity, Verifica- tion. JEL classifications Abstract It is past time to abandon significance testing. In case there is any reluctance to embrace this decision, proofs against the validity of testing to make decisions or to identify cause are given. In their place, models should be cast in their reality-based, predictive form. This form can be used for model selection, observable predictions, or for explaining outcomes. Cause is the ultimate explanation; yet the understanding of cause in modeling is severely lacking. All models should undergo formal verification, where their predictions are tested against competitors and against reality. 1 NATURE OF THE CRISIS We create probability models either to explain how the uncertainty in some observable changes, or to make prob- abilistic predictions about observations not yet revealed; see e.g. [7, 93, 86] on explanation versus prediction. The ob- servations need not be in the future, but can be in the past but as yet unknown, at least to the modeler. These two aspects, explanation and Corresponding author: William M. Briggs, Independent Researcher, New York, NY, United States. Email address: [email protected]

Transcript of Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York,...

Page 1: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

37Asian Journal of Economics and Banking (2019), 03(01), 37–80

Asian Journal of Economics and Banking

ISSN 2588-1396

http://ajeb.buh.edu.vn/Home

Reality-Based Probability & Statistics: Solving the Eviden-

tial Crisis

William M. Briggs �

Independent Researcher, New York, NY, United States

Article Info

Received: 29/01/2019Accepted: 14/02/2019Available online: In Press

Keywords

Artificial intelligence, Cause,Decision making, Evidence, Fal-sification, Machine learning,Model selection, Philosophy ofprobability, Prediction, Properscores, P-values, Skill scores,Tests of stationarity, Verifica-tion.

JEL classifications

Abstract

It is past time to abandon significance testing. In

case there is any reluctance to embrace this decision,

proofs against the validity of testing to make decisions

or to identify cause are given. In their place, models

should be cast in their reality-based, predictive form.

This form can be used for model selection, observable

predictions, or for explaining outcomes. Cause is the

ultimate explanation; yet the understanding of cause

in modeling is severely lacking. All models should

undergo formal verification, where their predictions

are tested against competitors and against reality.

1 NATURE OF THE CRISIS

We create probability models eitherto explain how the uncertainty in someobservable changes, or to make prob-abilistic predictions about observations

not yet revealed; see e.g. [7, 93, 86] onexplanation versus prediction. The ob-servations need not be in the future, butcan be in the past but as yet unknown,at least to the modeler.

These two aspects, explanation and

�Corresponding author: William M. Briggs, Independent Researcher, New York, NY, UnitedStates. Email address: [email protected]

Page 2: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 38

prediction, are not orthogonal; neitherdoes one imply the other. A modelthat explains, or seems to explain well,may not produce accurate predictions;for one, the uncertainty in the observ-able may be too great to allow sharpforecasts. For a fanciful yet illuminat-ing example, suppose God Himself hassaid that the uncertainty in an observ-able y is characterized by a truncatednormal distribution (at 0) with param-eters 10 and 1 million. The observableand units are centuries until the End OfThe World. We are as sure as we are ofthe explanation of y—if we call knowl-edge of parameters and the model anexplanation, an important point ampli-fied later. Yet even with this sufficientexplanation, our prediction can only becalled highly uncertain.

Predictions can be accurate, or atleast useful, even in the absence of ex-planation. I often use the example, dis-cussed below, of spurious correlations:see the website [?] for scores of these.The yearly amount of US spending onscience, space, and technology corre-lates 0.998 with the yearly number ofSuicides by hanging, strangulation, andsuffocation. There is no explanatorytie between these measures, yet becauseboth are increasing (for whatever rea-son), knowing the value of one wouldallow reasonable predictions to be madeof the other.

We must always be clear what amodel’s goal is: explanation or pre-diction. If it is explanation, thenwhile it may seem like an unnecessarystatement, it must be said that wedo not need a model to tell us whatwe observed. All we have to do islook. Measurement-error models, inci-dentally, are not an exception; see e.g.[21]. These models are used when whatwas observed was not what was wanted;

when, for example, we are interested iny but measure z = y + τ , with τ rep-resenting the measurement uncertainty.Measurement-error models are in thissense predictive.

For ordinary problems, again we donot need a model if our goal is to statewhat occurred. If we ran an experimentwith two different advertisements andtracked sales income, then a statementlike the following becomes certainly trueor certainly false, depending on whathappened: “Income under ad A had ahigher mean than under ad B.” That is,it will be the case that the mean washigher under A or B, and to tell all wehave to do is look. No model or test isneeded, nor any special expertise. Wedo not have to restrict our attention tothe mean: there is no uncertainty in anyobservable question that can be asked—and answered without ambiguity or un-certainty.

This is not what happens in ordi-nary statistical investigations. Insteadof just looking, models are immediatelysought, usually to tell us what hap-pened. This often leds to what I callthe Deadly Sin of Reification, where themodel becomes more real than reality.In our example, a model would be cre-ated on sales income conditioned on oras a function of advertisement (and per-haps other measures, which are not tothe point here). In frequentist statistics,a null significance hypothesis test wouldfollow. A Bayesian analysis might focuson a Bayes factor; e.g. [71].

It is here at the start of the model-ing process the evidential crisis has asits genesis. The trouble begins becausetypically the reason for the model hasnot been stated. Is the model meant tobe explanative or predictive? Differentgoals lead, or should lead, to differentdecisions, e.g. [78, 79, 78]. The classi-

Page 3: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 39

cal modeling process plunges ahead re-gardless, and the result is massive over-certainty, as will be demonstrated; seealso the discussion in Chapter 10 of [11].

The significance test or Bayes fac-tor asks whether the advertisement hadany “effect”. This is causal language.A cause is an explanation, and a com-plete one if the full aspects of a causeare known. Did advertisement A causethe larger mean income? Those who dotesting imply this is so, if the test ispassed. For if the test is not passed, it issaid the differences in mean income were“due to” or “caused by” chance. Leav-ing aside for now the question whetherchance or randomness can cause any-thing, if chance was not the cause, be-cause the test was passed, then it is im-plied the advertisements were the cause.Yet if the ads were a cause, they areof a very strange nature. For it willsurely be the case that not every obser-vation of income under one advertise-ment was higher than every observationunder the other, or higher in the sameexact amount. The implies inconstancyin the cause. Or, even more likely, itimplies an improper understanding ofcause and the nature of testing, as weshall see.

If the test is passed, cause is implied,but then it must follow the model wouldevince good predictive ability, because ifa cause truly is known, good predictions(to whatever limits are set by nature)follow. That many models make lousypredictions implies testing is not reveal-ing cause with any consistency. Recallcause was absurd in the spurious corre-lation example above, even though anystatistical test would be passed. Yetuseful predictions were still a possibil-ity in the absence of a known cause.

It follows that testing conflates ex-planation and prediction. Testing also

misunderstands the nature of cause, andconfuses exactly what explanation is. Isthe cause making changes in the observ-able? Or in the parameter of an ad hocmodel chosen to represent uncertaintyin the observable? How can a mate-rial cause change the size or magnitudeof an unobservable, mathematical ob-ject like a parameter? The obvious an-swer is that it cannot, so that our or-dinary understanding of cause in prob-ability models is, at best, lacking. Itfollows that cause has become too easyto ascribe cause between measures (“x”)and observables (“y”), which is a majorphilosophical failing of testing.

This is the true crisis. Tests basedon p-values, or Bayes factors, or onany criteria revolving around parame-ters of models not only misunderstandcause, and mix up explanation and pre-diction, they also produce massive over-certainty. This is because it is believedthat when a test has been passed, themodel has been validated, or provedtrue in some sense, or if not proved true,then at least proved useful, even whenthe model has faced no external valida-tion. If a test is passed, the theoriesthat led to the model form in the mindsof researchers are then embraced withvigor, and the uncertainty due thesetheories dissolves. These attitudes haveled directly to the reproducibility crisis,which is by now well documented; e.g.[19, 22, 60, 80, 84, 3].

Model usefulness or truth is in noway conditional on or proved by hypoth-esis tests. Even stronger, usefulness andtruth are not coequal. A model maybe useful even if it is not known to betrue, as is well known. Now usefulness isnot a probability concept; it is a matterof decision, and decision criteria vary.A model that is useful for one may beof no value to another; e.g. [42]. On

Page 4: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 40

top of its other problems, testing con-flates decision and usefulness, assuming,because it makes universal statements,that decisions must have the same con-sequences for all model users.

Testing, then, must be examinedin its role in the evidential crisis andwhether it is a favorable or unfavorablemeans of providing evidence. It will beargued that it is entirely unfavorable,and that testing should be abandonedin all its current forms. Its replacementsmust provide an understanding of whatexplanation is and restore predictionand, most importantly, verification totheir rightful places in modeling. Trueverification is almost non-existent out-side the hard sciences and engineering,fields where it is routinely demandedmodels at least make reasonable, veri-fied predictions. Verification is shock-ingly lacking in all fields where proba-bility models are the main results. Weneed to create or restore to probabilityand statistics the kind of reality-basedmodeling that is found in those scienceswhere the reality-principle reigns.

The purposes of this overview arti-cle are therefore to briefly outline thearguments against hypothesis testingand parameter-based methods of anal-ysis, present a revived view of causa-tion (explanation) that will in its full-ness greatly assist statistical modeling,demonstrate predictive methods as sub-stitutes for testing, and introduce thevital subject of model verification, per-haps the most crucial step. Except fordemonstrating the flaws of classical hy-pothesis testing, which arguments areby now conclusive, the other areas arepositively ripe with research opportuni-ties, as will be pointed out.

2 NEVER USE HYPOTHESISTESTS

The American Statistical Associa-tion has announced that, at the least,there are difficulties with p-values,[102]. Yet there is no official consensuson what to do about these difficulties,an unsurprising finding given that theofficial Statement on p-values was nec-essarily a bureaucratic exercise. Thisseeming lack of consensus is why readersmay be surprised to learn that every useof a p-value to make a statement for oragainst the so-called null hypothesis isfallacious or logically invalid. Decisionsmade using p-values always reflect notprobabilistic evidence, but are pure actsof will, as [76] originally criticized. Con-sequently, p-values should never be usedfor testing. Since it is p-values which areused to reject or accept (“fail to reject”)hypotheses in frequentism, because ev-ery use of p-values is logically flawed,it means that there is no logical justi-fication for null hypothesis significancetesting, which ought to be abandoned.

2.1 Retire P-values Permanently

It is not just that p-values are usedincorrectly, or that their standard levelis too high, or that there are good usesof them if one is careful. It is thatthere exists no theoretical basis for theiruse in making statements about null hy-potheses. Many proofs of this are pro-vided in [13] using several argumentsthat will be unfamiliar or entirely newto readers. Some of these are amplifiedbelow.

Yet it is also true that sometimesp-values seem to “work”, in the sensethat they make, or seem to make, de-cisions which comport with commonsense. When this occurs, it is not be-cause the p-value itself has provided a

Page 5: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 41

useful measure but because the mod-eler himself has. This curious situationoccurs because the modeler has likely,relying on outside knowledge, identifiedat least some causes, or partial causes,of the observable, and because in somecases the p-value is akin to a (loose)proxy for the predictive probabilities tobe explained below.

Now some say (e.g. [4]) that thesolution to the p-value crisis is to di-vide the magic number, a number whicheverybody knows and need not be re-peated, by 10. “This simple step wouldimmediately improve the reproducibil-ity of scientific research in many fields,”say these authors. Others say (e.g. [45])that taking the negative log (base 2) ofp-values would fix them. But these areonly glossy cosmetic tweaks which donot answer the fundamental objections.

There is a large and growing body ofcritiques of p-values, e.g. [5, 39, 25, 98,77, 100, 80, 1, ?, 81, 26, 46, 47, 57, ?].None of these authorities recommendusing p-values in any but the most cir-cumscribed way. And several others saynot to use them at all, at any time,which is also our recommendation; see[69, 99, 106, 59, 11].

There isn’t space here to survey ev-ery argument against p-values, or evenall the most important ones againstthem. Readers are urged to consult thereferences, and especially [13]. That ar-ticle gives new proofs against the mostcommon justifications for p-values.

2.2 Proofs of P-value Invalidity

Many of the proofs against p-values’validity are structured in the followingway: calculation of the p-value does notbegin until it is accepted or assumed thenull is true: p-values only exist whenthe null is true. This is demanded byfrequentist theory. Now if we start by

accepting the null is true, logically thereis only one way to move from this posi-tion and show the null is false. That isif we can show that some contradictionfollows from assuming the null is true.In other words, we need a proof by con-tradiction by using a classic modus tol-lens argument:

� If “null true” is true then a certainproposition Q is true;

� ¬ Q (this proposition Q is false infact);

� Then “null true” is false; i.e. thenull is false.

Yet there is no proposition Q in frequen-tist theory consistent with this kind ofproof. Indeed, under frequentist the-ory, which must be adhered to if p-values have any hope of justification,the only proposition we know is trueabout the p-value is that assuming thenull is true the p-value is uniformly dis-tributed. This proposition (the unifor-mity of p) is the only Q available. Thereis no theory in frequentism that makesany other claim on the value of p exceptthat it can equally be any value in (0, 1).And, of course, every calculated p (ex-cept in circumstances to be mentionedpresently) will be in this interval. Thuswhat we actually have is:

� If “null true” then Q=“p ∼U(0, 1)”;

� p ∈ [0, 1] (note the now-sharpbounds).

� Therefore...what?

First notice that we cannot move fromobserving p ∈ (0, 1), which is almost al-ways true in practice, to concluding thatthe null is true (or has been“failed to berejected”). This would be the fallacy of

Page 6: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 42

affirming the consequent. On the otherhand, in the cases where p ∈ {0, 1},which happens in practical computationwhen the sample size is small or whenthe number of parameters is large, thenwe have found that p is not in (0, 1),and therefore it follows that the null isfalse by modus tollens. But this is anabsurd conclusion when p = 1. For anyp ∈ (0, 1) (not-sharp bounds), it neverfollows that “null true” is false. Thereis thus no justification for declaring, be-lieving, or deciding the null is true orfalse, except in ridiculous scenarios (pidentical to 0 or 1).

Importantly, there is no statement infrequentist theory that says if the nullis true, the p-value will be small, whichwould contradict the proof that it is uni-formly distributed. And there is no the-ory which shows what values the p-valuewill take if the null is false. There is thusno Q which allows a proof by contradic-tion. Think of it this way: we begin bydeclaring “The null is true”; therefore,it becomes almost impossible to movefrom that declaration to concluding itis false.

Other attempts at showing useful-ness of the p-value, despite this uncor-rectable flaw, follow along lines devel-oped by [58], quoting John Tukey: “If,given A =⇒ B, then the existence ofa small ε such that P (B) < ε tells usthat A is probably not true.” As Holmessays, “This translates into an inferencewhich suggests that if we observe dataX, which is very unlikely if A is true(written P (X|A) < ε), then A is notplausible.”

Now “not plausible” is another wayto say “not likely” or “unlikely”, whichare words used to represent probability,quantified or not. Yet in frequentist the-ory it is forbidden to put probabilitiesto fixed propositions, like that found in

judging model statement A. Models areeither true or false (a tautology), andno probability may be affixed to them.P-values in practice are, indeed, usedin violation of frequentist theory all thetime. Everybody takes wee p-values asindicating evidence that A is likely true,or is true tout court. There simply isno other use of p-values. Every usetherefore is wrong; or, on the charita-ble view, we might say frequentists arereally closet Bayesians. They certainlyact like Bayesians in practice.

For mathematical proof, we havethat Holmes’s statement translates tothis:

Pr (A|X & Pr(X|A) = small) = small.(1)

I owe part of this example to HungNguyen (personal communication). LetA be the theory“There is a six-sided ob-ject that on each activation must showonly one of the six states, just one ofwhich is labeled 6.” Let X = “2 6s in arow.”We can easily calculate Pr(X|A) =1/36 < 0.05. Nobody would reject the“hypothesis” A based on this thin evi-dence, yet the p-value is smaller thanthe traditional threshold. And with X= “3 6s in a row”, Pr(X|A) = 1/216 <0.005, which is lower than the newerthreshold advocated by some. Most im-portantly, there is no way to calculate(1): we cannot compute the probabil-ity of A, first because theory forbids it,and second because there is no way totie the evidence of the conditions to A.Arguments like this to justify p-valuesfail.

A is the only theory under consider-ation, so A is all we have. If we use it,we assume it is true. It does not help tosay we have an alternate in the proposi-tion“A or not-A”, for that proposition isalways true because is a tautology, andit is always true regardless of whether

Page 7: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 43

A is true or false. What people seem tohave in mind, then, are more extremecases. Suppose X =“100 6s in a row”, sothat Pr(X|A) ≈ 1.5×10−78, a very smallprobability. But here the confusion ofspecifying the purpose of the model en-ters. What was the model A’s purpose?If it was to explain or allow calculations,it has done that. Other models, andthere are an infinite number of them,could better explain the observations, inthe sense that these models could bettermatch the old observations. Yet whatjustification is there for their use? Howdo we pick among them?

If our interest was to predict the fu-ture based on these past observations,that implies A could still be true. Ev-erybody who has ever explained thegambler’s fallacy knows this is true.When does the gambler’s fallacy be-come false and an alternate, predictivemodel based on the suspicion the devicemight be “rigged” become true? Thereis no way to answer these questions us-ing just the data! Our suspicion of de-vice rigging relates to cause: we thinka different cause is in effect than if Awere true. Cause, or rather knowledgeof cause, must thus come from outsidethe data (the X). This is proved for-mally below.

The last proofs against p-value useare not as intuitive, and also relate toknowledge of cause. We saw in Section1 that spending on science was highlycorrelated to suicides. Many other spu-rious correlations will come to mind.We always and rightly reject these, eventhough formal hypothesis testing (usingp-values or other criteria) say we shouldaccept them. What is our justificationfor going against frequentist theory inthese cases? That theory never tells uswhen testing should be adhered to andwhen it shouldn’t, except to imply it

should always be used. Many have de-veloped various heuristics to deal withthese cases, but none of them are validwithin frequentism. The theory says“reject” or “accept (fail to reject)”, andthat’s it. The only hope is that, inthe so-called long run (when, as Keynessaid, “we shall all be dead”), the deci-sions we make will be correct at theoret-ically specified rates. The theory doesnot the justify arbitrary and frequentdepartures from testing that most take.That these departures are anyway takensignals the theory is not believed seri-ously. And if it is not taken seriously,it can be rejected. More about the del-icate topic is found in [50, 52, 11].

Now regardless whether the previousargument is accepted, it is clear we arerejecting the spurious correlations be-cause we rightly judge there is no causalconnection between the measures, eventhough the “link” between the measuresis verified by wee p-values. Let us ex-pand that argument. In, for example,generalized linear models we begin mod-eling efforts with

µ = g−1(β1x1 + · · ·+ βpxp),

where µ is a parameter in the distribu-tion said to represent uncertainty in ob-servable y, g is some link function, andthe xi are explanatory measures of somesort, connected through g to µ via thecoefficients βi.

What happened to xp+1, xp+2, · · · ?An infinity of x have been tacitly ex-cluded without benefit of hypothesistests. This may seem an absurd point,but it is anything but. We exclude inmodels for observable y such measuresas “The inches of peanut butter in thejar belonging to our third-door-downneighbor” (assuming y is about someunrelated subject) because we recog-nize, as with the spurious correlations,

Page 8: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 44

that there can be no possible causalconnection between a relative stranger’speanut butter and an our observable ofinterest.

Now these rejections mean we arewilling to forgo testing at some times.There is nothing in frequentism to saywhich times hypothesis testing shouldbe rejected and which times it must beused, except, as mentioned, to suggestit always must be used. Two peoplelooking at similar models may there-fore come to different conclusions: oneclaiming a test is necessary to verify hishypothesis, the other rejecting the hy-pothesis out of hand. Then it is alsotrue many would keep an xj in a modeleven if the p-value associated with it islarge if there is outside knowledge thisxj is causally related in some way to theobservable. Another inconsistency.

So not only do we have proof that alluse of p-values are nothing except ex-pressions of will, we have that the test-ing process itself is rejected or acceptedat will. There is thus no theoretical jus-tification for hypothesis testing—in itsclassical form.

There are many other argumentsagainst p-values that will be more fa-miliar to readers, such as how increas-ing sample size lowers p-values, and thatp-value “significance” is no way relatedto real-world significance, and so on fora very long time, but these are so wellknown we do not repeat them, and theyare anyway available in the references.

There is however one special, orrather frequent, case in economics andeconometrics, where it seems testing isnot only demanded, but necessary, andthat is in so-called tests of stationarity.A discussion of this problem is held inabeyance until after cause has been re-viewed, because it impossible to thinkabout stationarity without understand-

ing cause. The answer can be givenhere, though: testing is not needed.

We now move to the replacement forhypothesis tests, where we turn the sub-jectivity found in p-values to our bene-fit.

3 MODEL SELECTION USINGPREDICTIVE STATISTICS

The shift away from formal testing,and parameter-based inference, is calledfor in for example [44]. We echo thosearguments and present an outline ofwhat is called the reality-based or pre-dictive approach. We present here onlythe barest bones of predictive, reality-based statistics. See the following refer-ences for details about predictive prob-abilities: [24, 37, 38, 61, 62, 66, 14, 12].The main benefits of this approach arethat it is theoretically justified whollywithin probability theory, and thereforehas no arbitrariness to it, that it un-like hypothesis testing puts questionsand answers in terms of observables,and that it better accords with the trueuncertainty inherent in modeling. Hy-pothesis testing exaggerates certaintythrough p-values, as discussed above.

Since the predictive approach won’tbe as familiar as hypothesis testing, wespend a bit more time up front beforemoving to how to apply it to complexmodels.

3.1 Predictive Probability Mod-els

All probability models fit into thefollowing schema:

Pr(y ∈ s|M), (2)

where y in the observable of interest(the dimension will be assumed by thecontext), s a subset of interest, so that“y ∈ s” forms a verifiable proposition.

Page 9: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 45

We can, at least theoretically, measuresits truth or falsity. That is, with sspecified, once y is observed this propo-sition will either be true or false; theprobability it is true is predicated onM, which can be thought of as a com-plex proposition. M will contain everypiece of evidence considered probativeto y ∈ s. This evidence includes allthose premises which are only tacit orimplicit or which are logically impliedby accepted premises in M. Say M in-sists uncertainty in y follows a normaldistribution. The normal distributionwith parameters µ and σ in this schemais written

Pr(y ∈ s|normal(µ, σ))

=1

2

[erf

(s2 − µσ√

2

)− erf

(s1 − µσ√

2

)],

(3)

where s2 is the supremum and s1 the in-fimum of s when s ∈ R, and assumings is continuous. In real decisions, s canof course be any set, continuous or not,relevant to the decision maker. M is theimplicit proposition “Uncertainty in y ischaracterized by a normal distributionwith the following parameters.”Also im-plicit in M are the assumptions lead-ing to the numerical approximation to(3) because of course the error functionis not analytic (erf(x) = 2√

π

´ x0e−t

2dt).

Since these approximations vary, theprobability of y ∈ s will also vary, essen-tially creating new or different M for ev-ery different approximation. This is nota bug, but a feature. It is also a warn-ing that it would be better to explicitlylist all premises and assumptions thatgo into M so that ambiguity can be re-moved.

It must be understood that each cal-culation of Pr(y ∈ s|Mi) for every dif-ferent Mi is correct and true (barringhuman error). The index is arbitrary

and ranges over all M under consid-eration. It might be that a proposi-tion in a particular Mj is itself knownto be false, where it is known to befalse conditioned on premises not inMj: if this knowledge were in Mj itwould contradict itself. But this out-side knowledge does not make Pr(y ∈s|Mj) itself false or wrong. That prob-ability is still correct assuming Mj iscorrect. For instance, consider M1 =“There are 2 black and 1 red balls inthis bag and nothing else and one mustbe drawn out”, Pr(black drawn|M1) =2/3, and this probability is true andcorrect even if it is discovered laterthat there are 2 and not 1 redballs (= M2). All our discoverymeans is that Pr(black drawn|M1) 6=Pr(black drawn|M2). This simple ex-ample should be enough to clear upmost controversies over prior and modelselection, as explained below.

It is worth mentioning that (3) holdsno matter what value of y is observed.This is because, unless as the case maybe s ≡ R or s ≡ ∅,

Pr(y ∈ s|normal(µ, σ))

6= Pr(y ∈ s|y, normal(µ, σ)).

The probability of y ∈ s conditioned onobserving the value of y will be extreme(either 0 or 1), whereas the probabil-ity of y ∈ s not conditioning on know-ing the value will not be extreme (i.e.in (0, 1)). We must always keep carefultrack of what is on the right side of theconditioning bar |.

It is usually the case that valuesfor parameters, such as in (3), are notknown. They may be given or esti-mated by some outside method, andthese methods of estimation are usu-ally driven by conditioning on observa-tions. In some cases, parameter valuesare deduced; e.g. such as knowledge

Page 10: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 46

that µ in (3) must be 0 in some engi-neering example. Whatever is the case,each change in estimate, observation,or deduction results in a new M. Com-parisons between probabilities is thusalways a comparison between models.Which model is best can be answeredby appeal to the model only in thosecases where the model itself has beendeduced by premises which are eithertrue themselves, or are accepted as trueby those interested in the problem athand. These models are rare enough,but they do exist; see [11] (Chapter 8)for examples. In other cases, becausemost models are ad hoc, appeal to whichmodel is best must come via exteriormethods, such as by the validation pro-cess, as demonstrated below.

In general, models are ad hoc, be-ing used for the sake of convenience orbecause of custom. Consider the nor-mal model used above. If the parame-ters are unknown, guesses may be made,labeled µ and σ. It is in general truethen that Pr(y ∈ s|normal(µ, σ)) 6=Pr(y ∈ s|normal(µ, σ)), with the equal-ity occurring (for any s) only when µ =µ and σ = σ. Thus we might sayM1 = normal (µ, σ) (where the valuesmay be known or supplied) and M2 =normal (µ, σ) (where guesses are used).Again, the guesses are usually driven bymethods applied to observations. Max-imum likelihood estimation is commonenough. So that it would be better towrite

M2 = normal (MLE(µ, σ,Dn)),

where Dn indicates the n previous ob-servations of y (the data). Method ofmoments is another technique, so thatwe might write

M3 = normal (MOM(µ, σ,Dn)),

And in general Pr(y ∈ s|M2) 6= Pr(y ∈s|M3). Both of these probabilities (for

any s) are correct. Whether or not oneis better or more useful than anotherwe have yet to answer. Importantly, wethen have, for example,

M4 = normal (MLE(µ, σ,Dn+m)),

where M2 is identical to M4 except forthe addition (or even subtraction) of mobservations. Again, in general, Pr(y ∈s|M2) 6= Pr(y ∈ s|M4). However, it isusually accepted that adding more ob-servations provides better estimates, sothat it follows M4 is superior to M2.However, it might not be that M4 ismore useful than M2. Usefulness is not aprobability or statistical concept: truthis, and we have already proven the prob-abilities supplied by all these instancesare correct. About usefulness, morepresently.

The predictive model selection pro-cess begins in this example like this:

Pr(y ∈ s|M4(Dn+m)) = p+ δ, (4)

Pr(y ∈ s|M2(Dn)) = p (5)

Following [64], we say the additional ob-servations m are relevant if δ 6= 0, andirrelevant if δ = 0. There are obvi-ous restrictions on the value of δ, i.e.δ ∈ [−1, 1] (the bounds may or maynot be sharp depending on the prob-lem). If adding new observations doesnot change the probability of y ∈ s,then adding these points has providedno additional usefulness. Relevance, asis clear, depends on s as well as M.Adding new observations may be rele-vant for some s (say, in the tails) butnot for others (say, near the median);i.e. δ = δ(s). As the s of interest them-selves depend on the decisions made bythe user of the probability model, rel-evance cannot be totally a probabilityconcept, but must contains aspects ofdecision. The form (4) would be usefulin developing sample size calculations,

Page 11: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 47

which are anticipated to be similar toBayesian sample size methods, e.g. [70].This is an open area of research.

The size of the critical value δ is alsodecision dependent. No universal valueexists, or should exist, as with p-values.The critical value may not be constantbut can depend on s; e.g. large relativechanges in rarely encountered y may notbe judged important. Individual prob-lems must “reset” the value of δ(s) eachtime.

As useful as the form (4) will be toplanning experiments, it is of more in-terest to use it with traditional modelselection. Here then is a more generalform:

Pr(y ∈ s|M1) = p+ δ, (6)

Pr(y ∈ s|M2) = p. (7)

This is, of course, identical in form to(4), which shows the generality of thepredictive method. M1 may be anymodel that is not logically deduciblefrom M2; if M2 is deducible from M1,then δ ≡ 0. What’s perhaps not obvi-ous is that (6) can be used both beforeand after data is taken: before data, itis akin to (4); after, we have genuinepredictive probabilities.

A Bayesian approach is assumedhere, though a predictive approach un-der frequentism can also be attempted(but not recommended). We have theschematic equation

Pr(y ∈ s|Mθ)

=

ˆΘ

Pr(y ∈ s|θMθ) Pr(θ|Mθ)dθ. (8)

Here M is a parameterized probabil-ity model, possibly containing observa-tions; and of course it also contains all(tacit, explicit and implicit) premisesused to justify the model. If (8) is calcu-lated before taking data, then Pr(θ|Mθ)

is the prior distribution and Pr(y ∈s|θMθ) the likelihood. In this case, (8)is the prior predictive distribution (al-lowing s to vary, of course). If the cal-culations are performed after data hasbeen taken, Mθ is taken to represent themodel plus data, and thus Pr(θ|Mθ) be-comes the posterior, and (8) becomesthe posterior predictive distribution.

All models have a posterior predic-tive form, though most models are not“pushed through” to this final form. See[6] for derivation of predictive posteriorsfor a number of common models, com-puted with so-called reference priors.Now here it worth saying that (8) withan Mθ which assumes as part of its list ofpremises a certain prior is not the sameas another posterior predictive distri-bution with the same model form butwith a different prior. Some are trou-bled by this. They should not be. Sinceall probability is conditional on the as-sumptions made, changing the assump-tions changes the probability. Again,this is not a bug, but a feature. Afterall, if we change the (almost always) adhoc model form we also change the prob-ability, and this is never bothersome.We have already seen that adding newdata points also changes the probabil-ity, and nobody ever balks at that, ei-ther. It follows below that everythingsaid about comparing models with re-spect to measures xi applies to compar-ing models with different priors.

The immediate and obvious ben-efit of (8) is that direct, verifiable,reality-based probabilistic predictionsare made. Since y is observable, wehave a complete mechanism in place formodel specification and model verifica-tion, as we shall soon see.

It is more usual to write (8) in a formwhich indicates both past data and po-

Page 12: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 48

tentially helpful measures x. Thus

Pr(y ∈ s|XDnMθ) =ˆΘ

Pr(y ∈ s|θXDnMθ) Pr(θ|XDnMθ)dθ,

(9)

where X = (x1, x2, · · · , xp) representsnew or assumed values of measures x,and Dn = (yn,Xn) are the previous nobservations (again, the dimension of yetc. will be obvious in context). Usu-ally or by assumption Pr(θ|XDnMθ) =Pr(θ|DnMθ).

For model selection, which assumes,but need not, the same prior and obser-vations, but which must choose betweenx, we finally have

Pr(y ∈ s|XDnM1) = p+ δ, (10)

Pr(y ∈ s|XDnM2) = p. (11)

Everything said above about δ applieshere, too. Briefly, addition of subtrac-tion of any number of x from one modelto the other will move δ away from 0.The distance it moves is a measure ofimportance of the change, but only withrespect to a decision made by the userof the model. At some s, a |δ| > 0.01may be crucial to one decision makerbut useless to a second, who may requirein his application a |δ| > 0.20 beforeacting. This assumes the same modelsand same observations for both decisionmakers—and same s. The predictiveapproach has thus removed a fundamen-tal flaw of hypothesis testing, which setone number for significance for all prob-lems and decisions everywhere. Therewas no simple way in hypothesis test-ing to use a p-value in decisions withdifferent costs and losses; yet predic-tive probability is suited for just thispurpose. And since these are predic-tive probabilities, the full uncertainty ofthe model and data are accounted for,

which fixes a second fundamental prob-lem of hypothesis tests, which revolvedaround unobservable parameters. Re-call that we can be as certain as possibleof parameter values but still wholly un-certain in the value of the observable.This point is almost nowhere appreci-ated, but it becomes glaringly obviouswhen data is analyzed.

3.2 Comparing other Model Se-lection Measures

Now (10) shares certain similaritieswith Bayes factors. These may be writ-ten in this context as

BF =Pr(Dn|M1,E)

Pr(Dn|M2,E)

=Pr(M1|Dn,E)

Pr(M2|Dn,E)

Pr(M2|E)

Pr(M1|E). (12)

This form implies there must be out-side or extra-model evidence E fromwhich we could assess Pr(M1|E) andPr(M2|E). If E suggests these are thetwo and the only two models under con-sideration, then it follows Pr(M1|E) =1−Pr(M2|E). It is difficult to see whatthis E might be, except for the sim-ple case where the obvious deductionPr(M1|E) = Pr(M1|E) = 1/2 is made.But then that would seem to hold forany two models, regardless the numberof changes made between models; i.e.we make the same deduction whetherone parameter changes between modelsor whether there are p > 1 changes.Of course, such problem-dependent Emight very well exist, and in practicethey always do, as the infinity of hy-potheses example above proved. Inthose cases where it is explicit it shouldbe used if the decision is to pick onemodel over another.

The BF also violates the predictivenature of the problem. It asks us tochoose a model as the once and final

Page 13: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 49

model, which is certainly a sometimegoal, but the measure here is not pre-dictive in terms of the observable. Ifour interest is predictive, and with anyfinite n, there will still be positive prob-ability for either model being true. Weneed not choose in these cases, but canform a probability weighted average ofy ∈ s assuming both models might betrue. This is another area insufficientlyinvestigated: why throw away a modelthat may be true when what one reallywants are good predictions?

We can in any case see that theBF exaggerates differences as hypoth-esis tests did, though not in the samemanner. For one, the explicit setting ofs is removed in the BF, whereas in thepredictive δ = δ(s) it is assumed anymodel may be judged useful for somes and not for others. The BF sort ofperforms an average over all s. A large(or small) BF value may be seen, butbecause we’re comparing ratios of prob-abilities the measure is susceptible toswings in the denominator toward 0. Ofcourse, the predictive (10) can be writ-ten in ratio form, i.e. p+δ

p, and this may

in some cases be helpful in showing howpredictive measures are related to Bayesfactors and other information criteria,such as the Bayesian Information Cri-terion, AIC, minimum message length,and so on. All these are open researchquestions. However, only the predictivemethod puts the results in a form thatare directly usable and requires no ad-ditional interpretation by model users.This latter benefit is enormous. Howmuch easier is to to say to a decisionmaker, “Given our model and data, theprobability for y ∈ s is p” than to say“Given our data and that we accept ourmodel is false, then we expect to see val-ues of this ad hoc statistic p × 100% of

the time, if we can repeat the experi-ment that generated our data an infi-nite number of times”? The questionanswers itself.

How to pick the s? They should al-ways be related to the decisions to bemade, and so s will vary for differentdecision makers. However, it should bethe case that for some common prob-lems natural s arise. This too is an openarea of research.

3.3 Example

Here is a small example, chosenfor its ease in understanding. TheBoston Housing dataset is comprised of506 observations of Census tract me-dian house prices (in $1,000s), alongwith 13 potential explanatory measures,the most interesting of which is nox,the atmospheric nitric oxides concentra-tion (parts per 10 million), [54]. Theidea was that high nox concentrationswould be associated with lower prices,where “associated” was used as a causalword. To keep the example simpleyet informative, we only use some ofthe measures: crim, per capita crimerate by town; chas, Charles River bor-der indicator; rm, average number ofrooms per dwelling; age, proportionof owner-occupied units built prior to1940; dis, weighted distances to fiveBoston employment centres; tax, full-value property-tax rate; and b, a func-tion of the proportion of blacks by town.The dataset is available in the R pack-age mlbench. All examples in this paperuse R version 3.4.4, and the Bayesiancomputation package rstanarm version2.17.4 with default priors.a.

The original authors used regressionof price on the given measures. The or-dinary ANOVA table is given in Table1.

a Code for all examples is available at http://wmbriggs.com/post/26313/

Page 14: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 50

Table 1 The ANOVA table for the linear regression of median house prices (in$1,000s) on a group of explanatory measures. All variables would pass ordinaryhypothesis tests.

Estimate Std. Error t value Pr(>|t|)

(Intercept) -11.035298 4.193677 -2.631 0.00877

crim -0.110783 0.036748 -3.015 0.00270

chas1 4.043991 1.010563 4.002 7.25e-05

nox -11.068707 4.145901 -2.670 0.00784

rm 7.484931 0.381730 19.608 < 2e-16

age -0.068859 0.014473 -4.758 2.57e-06

dis -1.146715 0.207274 -5.532 5.12e-08

tax -0.006627 0.002289 -2.895 0.00395

b 0.012806 0.003142 4.076 5.33e-05

The posterior distribution of the pa-rameters of the regression mimic theevidence in the ANOVA table. Thesearen’t shown because the interest is noton parameters, but observables. Mostresearchers would be thrilled by the ar-ray of wee p-values, figuring the modelmust be on to something. We shall seethis hope is not realized.

What does the predictive analysisshow? That’s a complicated ques-tion because there is no single, univer-sal answer like there is in hypothesis-testing, parameter-centric modeling.This makes the method more burden-some to implement, but since the pre-dictive method can answer any questionabout observables put to it, it’s gener-ality and potential are enormous.

We cannot begin without askingquestions about observables. This isimplicit in the classical regression, too,only the questions there have nothingdirectly to do with observables, and sonobody really cares about them. Hereis a question which I thought interest-ing. It may be of no interest to anyother decision maker, all of whom mayask different questions and come to dif-ferent judgments of the model.

The third quartile observed hous-ing price was $35,000. What is thepredicted probability prices would behigher than that given different levelsof nox for data not yet observed? Theanswer for old data can be had by justlooking. In order to answer that, wealso have to specify values for crim,chas, and all the other measures wechose to put into the model. I pickedmedian observed values for all. Thestan_glm method was used to form a re-gression of the same mathematical formas the classic procedure, and the poste-rior_predict method from that pack-age was used to form posterior predic-tive distributions, i.e. eq. (9). Theseare solved using resampling methods;for ease of use and explanation the de-fault values on all methods were used.

Fig. 1 shows the relevance plot forthe models with and without nox. Thisis the predictive probability of hous-ing prices greater than $35,000 with allmeasures are set at their median value,and with nox varying from its minimumto maximum observed values. The linesare not smooth because they are theresult of a resampling process; largerresamples would produce smoother fig-

Page 15: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 51

0.4 0.5 0.6 0.7 0.8

0.01

0.02

0.03

0.04

nox (ppm)

Pr(

Pric

e >

$35

,000

| X

DM

)

With noxNo nox

Fig. 1 Relevance plot, or the predictive probability of housing prices greater than$35,000, using models with nox (black line) and without (blue). All other measures areset at their median value.

ures; however, these are adequate forour purposes.

The predictive probability of highhousing prices goes from about 4% withthe lowest levels of nox, to somethingnear 0% at the maximum nox values.The predictive probability in the modelwithout nox is about 1.8% on aver-age. The original p-value for nox was0.008, which all would take as evidenceof strong effect. Yet for this questionthe probability changes are quite small.Are these differences (a ± 2% swing)in probability enough to make a differ-ence to a decision maker? There is nosingle answer to that question. It de-pends on the decision maker. And therewould still not be an answer until it wascertain the other measures were makinga difference. Now experience with thepredictive method shows that often ameasure will be predictively useful, butwhich also gives a large p-value; but wealso see cases where the measure showsa wee p-value but does not provide anyreal use in predictive situations. Everymeasure has to be checked (and this is

easily automated). We don’t do thishere because it would take us too farafield.

What might not be clear but needsto be is that we can make predictions forany combination of X, for any functionof Y. Usefulness of X (any of its con-stituent parts) is decided with respect tothese functions of Y, which in turn aredemanded by the decisions to be made.Usefulness is in-sample usefulness, withthe real test of any model being verifi-cation, which is discussed below. An-ticipating that, we have a most usefulplot, which is the probability predictionof Y for every old observed X, suppos-ing that old X were new. This is shownin Fig 2.

Price (s) in on the x-axis, and theprobability of future prices less thans, given the old data and M, are onthe y-axis. A dotted red line at $0 isshown. Now we know based on exter-nal knowledge to M that it is impos-sible prices can be less than $0. Yetthe model far too often gives positiveprobabilities for impossible prices. The

Page 16: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 52

−20 0 20 40 60

0.0

0.2

0.4

0.6

0.8

1.0

Price = s

Pr(

Pric

e <

s |

XD

M)

Fig. 2 The probability prediction of housing prices for every old observed X,supposing that old X were new. The vertical red dashed line indicates prices we knowto be impossible based on knowledge exterior to M.

worst prediction is about a 65% chancefor prices less than $0. I call this phe-nomenon probability leakage, [9]. Natu-rally, once this is recognized, M shouldbe amended. Yet it never would be rec-ognized using hypothesis testing or pa-rameter estimation: the flaw is only re-vealed in the predictive form. An or-dinary regression is inadequate here. Ido not here pursue other models, whichare here not the point. What shouldbe fascinating is the conjecture thatmany, many models in economics, ifthey were looked at in their predictivesense, would show leakage, and whenthey do it is another proof the ordi-nary ways of examining model perfor-mance generate over-certainty. For asexciting as the wee p-values were in theANOVA table above, the excitementwas lessened when we looked for practi-cal differences in knowledge of nox. Andthat excitement turned to disappoint-ment when it was learned the model hadtoo much leakage to be useful in a widevariety of situations.

We have only sketched the many

opportunities in predictive methods re-search. How the predictive choices fit inwith more traditional informational cri-teria, such as given in [65], is largely un-known. It seems clear the predictive ap-proach avoids classic “paradoxes”, how-ever, such as for instance given in [67],since the focus in prediction is alwayson observables and their probabilities.

It should also be clear that nox, use-ful of not, could not cause a change inhousing prices. In order to be a cause,nox would somehow have to seep intorealtors’ offices and push list prices upor down. This is not a joke, but a neces-sary condition for nox being an efficientcause. Another possibility is that buy-ers’ or sellers’ perception of nox causedthem to raise or lower prices. Thismight happen, but it’s scarcely likely:how many home owners can even iden-tify what nox is? So it might be thatnox causes other things that, in turn oreventually, cause prices to change. Or itcould be that nox has nothing to do withthe cause of price changes, and that itsassociation with price is a coincidence or

Page 17: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 53

the result or “confounders.” There is noway to tell by just examining the data.This judgment is strict, and is proven inthe next Section.

4 Y CAUSE?

This section is inherently and neces-sarily more philosophical than the oth-ers. It addresses a topic scientists, thecloser their work gets to statistics, aremore accustomed to treating cavalierlyand, it will be seen, sloppily. As such,some of the material will be entirely newto readers. The subject matter is vast,crucial, and difficult. Even though thisis the longest section, it contains onlythe barest introduction, with highlightsof the biggest abuses in causal ascrip-tion common in modeling. One of thepurposes of models we admitted was ex-planation, and cause is an explanation.There are only two possibilities of causein any model: cause of change in theobservable y or cause of change in theunobservable, non-material parametersin the model used to characterize uncer-tainty in y. So that when we speak ofexplanation, we always speak of cause.Cause is the explanation of any observ-able, either directly or through a param-eter. Since models speak of cause, orpurport to, we need to understand ex-actly where knowledge of cause arises:in the data itself through the model, orin our minds. We must understand justwhat cause means, and we must knowhow, when, and if we really can identifycause. It will be seen that, once again,classic data analysis procedures lead toover-certainty.

4.1 Be Cause

There is great confusion about therole cause and knowledge of cause playsin statistical and econometric models,

e.g. [43, 97], which are all speciesof probability models. I include inthis designation all artificial intelligenceand so-called machine learning algo-rithms whose outputs while they maybe point predictions are not meantto be taken literally or with absolutecertainty, implying uncertainty is war-ranted in their predictions; hence theyare non-standard forms of probabilitymodels. We may say all such al-gorithms, from the statistical to thepurely computational, are uncertaintymodels. Any model which producesanything but certain and known-to-be-certain predictions is an uncertaintymodel in the sense that probability,quantified or not, must be used to graspthe output.

Can probability or uncertainty mod-els discover cause? Always or never,or only under certain circumstances?What does cause mean? These are alllarge topics, impossible to cover com-pletely in a small review article. So herewe have the limited goal of exploring theconfusing and varying nature of causein probability models in common use,and in contrasting modern, Humean,Popperian, and Descartean notions ofcause with the older but resurgent Aris-totelian ideas of cause that, it willbe argued, should be embraced by re-searchers for the many benefits it con-tains.

For good or bad, and mostly a wholelot of bad, causal language is often usedto convey results of probability mod-els; see [49]. Results which are merelyprobable are far too often taken as cer-tain. [36] relates a history of attemptsat assigning cause in linear models, be-ginning with Yule in 1899. These at-tempts have largely been similar: theybegin by specifying a parameterized lin-ear model. One or more of the pa-

Page 18: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 54

rameters are then taken to be effects ofcauses. Parameter estimates are oftencalled “effect size”, though the causesthought to generate these effects arenot well specified. Models are oftenwritten in causal-like form (to be de-scribed below), or cause is conceived bydrawing figurative lines or “paths” be-tween certain parameters. The conclu-sion cause exists or does not exist de-pends on the signs of these parameters’estimates. [83] has a well known bookwhich purports to design strategies atautomatically identifying causes. [27]investigate design of experiments whichare said to lead to causal identification.It will be argued below that these arevain hopes, as algorithms cannot under-stand cause.

What’s hidden in these works is thetacit assumption, shared by frequen-tist, Bayesian, and computer modelingefforts, that cause and effect can al-ways be quantified, or quantified withat enough precision to allow cause tobe gleaned from models. This un-proved and really quite astonishinglybold assumption doubtless flows the no-tion that to be scientific means to bemeasurable; see [28]. It does not fol-low, however, that everything can bemeasured. And indeed, since Heisen-berg and Bell, [92], we have known thatsome things, such as the causes for cer-tain quantum mechanical events, can-not be measured, even though some ofthe effects might and are. We thereforeknow of cases where knowledge of causeis impossible. Let us see whether thesecases multiply.

Now cause is a philosophical, ormetaphysical concept. Many scientiststend to view philosophy with a skepti-cal eye; see e.g. [94]. But even sayingone has no philosophy is a philosophy,and the understanding of cause or even

the meaning of any probability requiresa philosophy, so it is well to study whatphilosophers have said on the subject ofcause, and see how that relates to prob-ability models.

The philosophy we adopt here isprobabilistic realism, which flows fromthe position of moderate realism; see[82, 32]. It is the belief that the realworld exists and is, in part, knowable;it is the belief that material things existand have form, and that form can existin things or in minds (real triangles ex-ist, and we know the form of triangle inthe absence of real triangles). In math-ematics, this is called the AristotelianRealist philosophy, see [35] for a recentwork. This is in contrast to the morecommon Platonic realism, which holdsnumbers and the like exist as forms insome untouchable realm, and nominal-ism, which holds no forms exists, onlyopinion does; see [89] for a history. Themoderate realist position is another rea-son we call the approach in this pa-per reality-based probability. Probabil-ity does not exist as a thing, as a Pla-tonist would hold, but as an idea in themind. Probability is thus purely episte-mological. This is not proved here, but[11] is an essential reference.

What is cause? [56] opens his arti-cle on probabilistic causation by quotingHume’s An Enquiry Concerning HumanUnderstanding: “We may define a causeto be an object, followed by another,and where all the objects similar to thefirst, are followed by objects similar tothe second.” This seemingly straight-forward theory—for it is a theory—ledHume through the words followed byanother ultimately to skepticism, andto his declaration that cause and eventwere “loose and separate”. Since manyfollow Hume, our knowledge of cause isoften said to be suspect. Cause and ef-

Page 19: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 55

fect are seen as loose and separate be-cause that followed by cut the link ofcause from effect. The skepticism aboutcause in turn led to skepticism aboutinduction, which is wholly unfortunatesince our surest knowledge, such as thatabout mathematical axioms, can onlycome from inductive kinds of reason-ings; there are at least five kinds of in-duction. The book by [48] is an essen-tial reference. Skepticism about induc-tion led, via a circuitous route throughPopper and the logical positivists, tohypothesis testing, and all it associateddifficulties; see the histories in [20, 8].However, telling that story would takeus too far afield; interested readers canconsult [11] (Chapter 4), [95, 104] aboutinduction, and Briggs (Chapter 5) and[13] about induction and its relations tohypothesis testing.

In spite of all this skepticism, whichpervades many modern philosophicalaccounts of causation and induction,scientists retained notions of cause (andinduction). After all, if science was notabout discovering cause, what was itabout? Yet if scientists retained con-fidence in cause, they also embracedHume’s separation, which led to curiousinterpretations of cause. Falsification,a notion of Popper’s, even though it islargely discredited in philosophical cir-cles ([94, 96]), is still warmly embracedby scientists, even to the extent thatmodels that are said not to be falsifiableare not scientific.

It’s easy to see why falsification isnot especially useful, however. If forexample a model on a single numeri-cal observable says, as many probabil-ity models do say, an observable cantake any value on the real line witha non-zero probability, no matter howsmall that probability (think of a nor-mal model), then the model may never

be falsified on any observation. Falsi-fication can only occur where a modelsays, or it is implied, that a certainobservable is impossible—not just un-likely, but impossible—and we subse-quently see that observable. Yet evenin physical models when this happensin practice, which is rare, the actualfalsification is still not necessarily ac-cepted because the model’s predictionsare accompanied by a certain amount of“fuzz” around its predictions, [23]; thatis, the predictions are not believed to beperfectly certain. With falsification, aswith testing, many confuse probabilitywith decision.

Another tacit premise in modernphilosophies is that cause is limited toefficient causality: described loosely asthat which makes things happen. Thislimitation followed from the rejection ofclassical, Aristotelian notions of cause,which partitioned cause into four parts:(1) the formal or form of a thing, (2)the material or stuff which is causingor being affected, (3) efficient cause,and (4) final cause, the reason for thecause, sometimes called the cause of(the other) causes. See [31] for a gen-eral overview. For example, consider anashtry: the formal cause is the shapeof the ashtray, the material cause is theglass comprising it, the efficient causethe manufacturing process used to cre-ate it, and the final cause the purpose,which is to hold ashes. The readerwould benefit from thinking how thefullness of cause explains observables ofinterest to him.

Final causation is teleological, andteleology is looked on askance by manyscientists and philosophers; biologists inparticular are skittish about the con-cept, perhaps fearing where embracingteleology might lead; e.g. [68]. What-ever its difficulties in biology, teleology

Page 20: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 56

is nevertheless a crucial element in as-sessing causes of willed actions, whichare by definition directed, and whichof course include all economic actions,e.g. [105]. Far from being rarefied, allthese distinctions are of the utmost im-portance, because we have to know justwhich part of a cause is associated withwhat parameter in a probability model,as discussed momentarily. This is anarea of huge research opportunity. Forinstance, is the parameter representingthe efficient cause, or the material? Orthe formal or final. Because cause isbelieved in modern research to be onething, over-certainty again arises.

The modern notion of cause, asstated above, is that a cause, a powerof some kind, acts, and then at somefuture point an effect occurs. Thedistance in time of this separation ofcause and effect is never exactly speci-fied, which should raise a suspicion thatsomething from the description is miss-ing. It has led to the confusion thattime series (in the formal mathemati-cal sense) might be causal. In any case,it is acknowledged that efficient causesoperate on material things which pos-sess something like a form, though theform of the material thing is also al-lowed to be indistinct, meaning thatit is accepted that the efficient causemay change the form of a thing, whichstill nevertheless still remains the samething, at least in its constituent parts.The inconsistency is that the form of thething describes its nature; when a formis lost or changed to a new form, the oldform is gone.

Contrast this confusing descriptionwith the classical definition of substan-tial form; essential references are [34, 82,32]. Substances, i.e. things, are com-posed of material plus form. A pieceof glass can take the form of a win-

dow or an ashtray. Substances, whichare actual, also have or possess poten-tiality; to be here rather than there,to be this color rather than that, tonot exist, and so forth. Potentiality isthus part of reality. Potentiality be-comes actuality when the substance iscaused to change. The principle of(efficient) causality (accepted by mostphilosophers) states that the reductionof potency to actuality requires some-thing actual. A change in potentialityto actuality is either in essence, whensomething comes to be or passes out ofexistence, or in accident, when some-thing about a thing changes, such as po-sition or in some other observable com-ponent, but where the substance retainsits essence (a piece of glass moved is stilla piece of glass, unless it is broken ormelted, then its essence has changed).A probability model quantifies poten-tiality in this sense. This is an activearea of research in quantum mechanicsand probability; see [63, 90].

Cause is ontological: it changes athing’s being or accidents, which are de-fined as those properties a substance haswhich are not crucial for its essence, i.e.what it is. A red house painted white isstill a house. It is everywhere assumedthe principle of sufficient reason holds,which states that every thing that ex-ists has a reason or explanation for itsexistence. In other words, events do nothappen for “no reason”; the idea thatthings happen for “no reason” is, afterall, an extremely strong claim. Nowwe can be assured that there are suffi-cient reason for a thing’s existence, butthis in no way is an assertion that any-body can know what those reasons al-ways are. And indeed we cannot alwaysknow a thing’s cause, as in quantum me-chanics.

Knowledge of cause is epistemolog-

Page 21: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 57

ical. As with anything else, knowl-edge can be complete, of truth or fal-sity, or incomplete, and of a probabilis-tic nature. If cause is purely efficient,then uncertainty of cause is only of effi-cient causes; indeed, as we’ll see belowthis is the way most models are inter-preted. There is an unfortunate ambi-guity in English with the verb to deter-mine, which can mean to cause or toprovide sufficient reason. This must bekept in mind if cause is multifaceted, be-cause a model make speak of any of fourcauses.

4.2 Cause in Models

Using a slightly different notationthan above, most probability modelsfollow the schema y ∼ f(x, θ,M), wherey is the observable of interest, and Mthe premises or evidence used to im-ply the form of f , i.e. the model.The function f is typically a probabil-ity distribution, usually continuous, i.e.y ∈ R. Continuity of the observable isan assumption, which is of course im-possible to verify, because no methodof measurement exists that could eververify whether y is actually infinitelygraduated (in whatever number of di-mensions): all possible measurementswe can make are discrete and finite inscope. This may seem like a small point,but since we are interested in the causeof y, we have to consider what kind ofcause can itself be infinitely graduated,which must be the case if y can takeinfinitely many values—where the sizeof the infinity has yet to be discovered.Is y ∈ N or is y ∈ R or is y in somehigher infinity still? It should make usgasp to think of how a cause can op-erate on the infinite integers, let alonethe “real” numbers, where measure the-ory usually stops. But if we think wecan identify cause on R, why not believe

we can identify it on sets with cardi-nality larger than ℵ1 (the cardinality ofR)? These are mind-boggling questions,but it is now perhaps clear the differ-ence between potentiality and actualityis crucial. We can have a potentiallyinfinite number of states of an observ-able, but only a finite number of actualstates. Or we can have a potentially in-finite number of observables, but only afinite number of actual observables: ifany observable was infinite in actuality,that’s all we would see out of our win-dows. Needless to say, how cause fitsin with standard measure theory is anopen area of research.

The model f (given by M) of theuncertainty of y is also typically con-ditioned on other measures x, whichare usually the real point of investiga-tion. These measures x are again them-selves also usually related with param-eters θ, themselves also thought con-tinuous. As said, there are a host ofassumptions, many usually implicit, orimplicit and forgotten, in M, which arethose premises which justify the modeland explain its terms. This is so even if,as if far from unusual, M is the excusefor an ad hoc model. Most models inactual use are ad hoc, meaning the theywere deduced from first principles.

The stock example of a statisticalmodel is regression, though what issaid below applies to any parameter-ized model with (what are called) co-variates or variables. Regression beginsin assuming the uncertainty in the ob-servable y is characterized by a parame-terized distribution, usually the normal,though this is expanded in generalizedlinear regression. The first parameter µof the normal is then assumed to followthis equation:

µ ∼ θ0 + θ1x1 + θ2x2 + · · ·+ θpxp. (13)

Page 22: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 58

The description of the uncertainty inthis form is careful to distinguishthe probabilistic nature of the model.Equation (13) says nothing about thecauses of y. It is entirely a representa-tion of how a parameter representing amodel of the uncertainty in y changeswith respect to changes in certain othermeasures, which may or may not haveanything to do with the causes of y. Themodel is correlational, not causal. Theparameters are there as mathematical“helpers”, and are not thought to existphysically. They are merely weights forthe uncertainty. And they can be got ridof, so to speak, in fully correlational pre-dictive models; i.e. where the parame-ters are integrated out. For example,Bayesian posterior predictive distribu-tions; see above and [6]. In these cases,as above, we (should) directly calculate

Pr(y ∈ s|XDnM) (14)

We remind the reader that M containsall premises which led to the form (13),including whatever information is givenon the priors of the parameters and soforth. Again, no causation is implied,there are no parameters left, and every-thing, for scientific models, is measur-able. Equation (14) shows only how the(conditional on D and M) probabilityof y ∈ s changes as each xi does. Ofcourse, the equation will still give an-swers even if no xi has any causal con-nection to y in any way. Any x insertedin (14) will give an answer for the (con-ditional) probability of y ∈ s, even whenthe connection between any x and y isentirely spurious. The hope in the caseof spurious xi is that the xi will showlow or no correlation with y and that

Pr(y ∈ s|XDnM) ≈ Pr(y ∈ s|X−iDnM) ∀s(15)

Indeed, if the equality is strict, then xiis said to be as above, using the wording

on [64], irrelevant for the understandingof the uncertainty of y. If the equal-ity is violated, xi is relevant. Relevancedoes not imply importance or that acause between x and y has been demon-strated.

Another way of writing regression,which is mathematically equivalent butphilosophically different, is this:

y = θ0+θ1x1+θ2x2+· · ·+θpxp+ε. (16)

This form is taken as directly or vaguelycausal, depending on the temperamentof the reader. This is a model of y it-self and not the uncertainty of y as in(13). The last term ε is said, by some,to be normal (or some other distribu-tion). Now to be is, or can be, an on-tological claim. In order for ε to onto-logically be normal, probability has tobe real, a tangible thing, like mass orelectron charge is. The ε is sometimescalled an “error term”, as if y could bepredicted perfectly if it were not for theintroduction of this somewhat mysteri-ous “exogenous” cause or causes. The xiare interior to the model, meaning theypertain to y in some, usually undefined,causal way.

The indirect causal language usedfor models in this form is vague in prac-tice. This form of the model says thatthe xi are causes of y in combinationwith the θi, which may be moderatorsof causes or the effects of causes dueto the x. What part of cause—formal,material, efficient, final—is never said,though efficient causation is frequentlyimplied. Often the θi are called effectsizes, which is causal language, mean-ing that unit increases in xi cause y tochange by θi, which is the measurable ef-fect. But if this is true, then this causemust always operate in just the sameway, unless blocked. More on this im-plications of this in a moment.

Page 23: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 59

Many physics or deterministic mod-els are written in a way similar to (16),but in those cases cause is better under-stood. In probability models, the ε is as-sumed to be filled with causes, thoughsmall ones, like flea bites. The causesxi are linear in effect, or near enoughso, with negligible forces outside linear-ity; if there are such forces they are de-posited into ε or into functions of the xi.The error term is causal but in such away the effects are constrained not to belinear individually, but of a certain dis-tributional shape. Causes outside thisshape may exist, as might causes out-side the xi, but they are not capturedby the model. Perfect predictability israrely claimed, which is an admission allcauses have not been identified.

Now all this is very confusing; worse,the amorphic causal language is bol-stered when null hypothesis significancetesting is used to decide which xi “be-long” to (16) and which do not. A so-called null hypothesis says typically thatsome θj = 0, which is taken as mean-ing that xj is not causal, which is anodd way to put it, as if θj itself is thetrue cause. The “alternate hypothesis”is that θj 6= 0, which is everywheretaken as meaning xj is causal, or thatit is something like a cause, such as a“link”. When a p-value associated withθj is greater than the magic number, xjis sometimes thought not to be a cause;it is dismissed as if it has been provennot to be cause, or that it might be acause but of great weakness. But thensometimes xj, even with a large p-valueis still believed to be a cause thoughthat there was not yet enough evidenceto “prove” xj was a cause—or a “link”.

Link is nowhere defined in the lit-erature. It may be that xj is a directcause or some indirect cause; it is neverquite specified, nor is the part of cause

identified. It may be that a third, or afourth, or some number down the chain,measure changes in response to y, whichin turn back up the chain causes xj tochange. Now whatever the true causeis, it either happens to y or it happensto µ. But a cause can only happen to aparameter if that parameter materiallyexists. There does not exist, as far asI can discover, any explanation of howany causal power may change the valueof a parameter: by what mechanism?Instead, it appears to be believed thatobservables y themselves have distribu-tions. We see this in language that y are“drawn” from distributions, which musttherefore exist in some Platonic realm(e.g. “y is normal”). Frequentists are,of course, committed to believing this,though most have not thought out theimplications of this assertion; again, see[51, 52] for criticisms. Finally, if observ-ables really do have distributions in anontological sense, then it must be thatcauses operating on observables reallydo cause changes in their immaterialparameters—which always then mustexist as real things in subsequences ofsequences going to the limit (the def-inition of frequentism). Frequentismis Platonism. Even if any of this istrue, it should at least be clear to thereader that all is not well thought out.The alternative is to forgo frequentismand move to more logically consistenttheories of probability where cause isconstrained to observables and tangiblemeasures; i.e. a theory that is reality-based.

Regression is the paradigmatic pa-rameterized probability model. Thereare other kinds of models, though,which are not parameterized but whichare still discussed as if they are causalbut yet have probabilistic interpreta-tions. Neural nets and the like fall

Page 24: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 60

into this class of probability model.Classification and regression trees, ran-dom forests and other various so-calledmachine learning algorithms do, too;;see [7]. Without getting into toomuch detail, artificial intelligence andgeneric machine learning models pro-duce strings of decision rules like this:

If (x12 < a and b ≤ x7 < c) or (x25

= Yellow ...) then y ∈ s. (17)

These pronouncements can and usuallyare made probabilistic by grouping rulesand permutations of rules together, suchthat when the algorithm is given a set ofx, it produces a probability of y ∈ s (orsomething which is treated like a prob-ability). Since these algorithms workwith actual observations, they usuallymaintain a discrete and finite stance.This is in their favor since no ad hocassumptions about formal probabilitymodels are made. When any formalprobability is overlaid upon them, theycan be treated like other parameterizedprobability models.

Various indirect measures of impor-tance for each xi are made in thesemodels, such as examining how accu-racy (defined relative to in-sample pre-dictions of y) increases or decreases ifxi if removed from the model. If a par-ticular xj is never or rarely selected ina rule, it is thought not to be causal,or of low causal power; otherwise it isseen in the same senses as with regres-sion above. The ascription of cause isas with parameterized models: the lan-guage is vague and claims of cause areshifting.

The hope of these non-parameterized models, especially of hu-man behavior or biology, is that if onlyenough different xi are collected y canbe predicted with certainty or withnear-certainty. Even if it is acknowl-

edged that this ultimate-data-collectionhope will not be realized in practice, itstill exists as a hope. When it does, itis a tacit acknowledgement the modelsare thought causal, at least partially.This hope also assumes all causes aremeasurable, and therefore are mate-rial. These are very strong claims. Inphysics, this hope is known to be false,as said. Perhaps a Bell-like theorem ofthe unknowability of cause in humanbehavior will be somebody be found.

This is very important for economicssince all measures and observables arecaused ultimately by human behavior,so it is worth emphasizing. For somephysical processes in quantum mechan-ics, it is well known that perfect, i.e.certain, prediction is impossible. Mea-sures xi cannot be found that will bringcertainty. This is why, besides Bell’sproofs, Heseinberg’s Uncertainty Prin-ciple has such an apt name. Free will,or free choice, which of course is implicitin all observables about human behav-ior, is also not predictable in an analo-gous way. Some who use a computer-as-brain metaphor believe that all hu-man behavior is in principle determinis-tic, i.e. it has distinct material causes,and that free will is an “illusion” (itis never stated who is having the illu-sion). But these expressions are hope-ful only, because no causal mechanismis known that can cause a being to be-lieve it is conscious and has experiencesof choice. Indeed, some philosophers be-lieve human intellect and will to be non-material ([33]), and therefore that whichis causing changes in will cannot bemeasured (effects of will can, however,be measured). Others argue againstthe computer-as-brain metaphor, e.g.[75]. These counter-arguments (to thehope that all life is strictly determinis-tic) collectively can be taken, at least

Page 25: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 61

in a metaphorical sense, as the start ofa Bell-like theorem that cause of willcannot with certainty be known. Thismeans models which take any account ofhuman behavior, which includes all eco-nomic and financial observables, mustin the end remain in the correlationalrealm with regard to behavior.

4.3 When Cause is not a Cause

It cannot be denied that cause canbe known, and that some causal x canbe put into uncertainty models. Indeed,when this happens, which it does es-pecially in those models which are fre-quently tested against reality, it is dis-covered testing (and p-values) “verifies”these causes. Yet they were known, orsuspected, before testing. And indeedmust have been, as we shall see. Howcause is known is discussed presently.We must first understand cause is notsimple. We remind the reader of a sim-ple example. Consider an experimentwhich measures exposure or not to somedread thing and whether or not peo-ple developed some malady. Hypothesistesting might “link” the exposure to thedisease; and, indeed, everybody wouldact as if they believed the exposure hascaused the malady. But the exact op-posite can be true.

It will almost surely be the case thatnot everybody in the exposed group willhave developed the malady; and it willalso be that some people in the not-exposed group will also have the mal-ady. It thus cannot be that the peoplein not-exposed group had their diseasecaused by the exposure, for of coursethey were not exposed. It then necessar-ily follows that their malady was causedby something other than the exposure.This, then, is definitive proof that atleast one more cause than the supposedcause of the exposure exists. There is

no uncertainty in this judgment.We still do not know if the exposure

was a cause. It could be that every per-son in the exposed group had their dis-ease caused by whatever caused the dis-ease in the not-exposed group—or therecould even be other causes that did notaffect anybody in the not-exposed groupbut that, somehow, caused disease inthe exposed group. It could be that ex-posure caused some disease, but there isno way to tell, without outside assump-tions, how many more maladies (besidesthe known other cause(s)) were causedby the exposure.

It’s worse still for those who holduncertainty models can discover cause.For how do we explain those people ineither group who did not develop thedisease? Even if exposure causes diseasesometimes, and the other (unknown-but-not-exposure) cause which we knowexists only causes disease sometimes, westill do not know why the exposure ornon-exposure causes disease only some-times. Why did these people developthe malady and these not? We don’tknow. We can “link” various correla-tions as“cause blockers”or“mitigators”,but we’re right back where we startedfrom. We don’t know, from the dataalone, what is a cause and what is not,and what blocks these (at least) twocauses sometimes but not in others.

Cause therefore, like probability, isknown conditionally. In our most rig-orous attempts at discovering cause, wedesign an experiment to demonstrate acausal connection or effect: x is believedto cause a change in y. Every known orbelieved cause or moderator of y or of xis controlled, to the best of the exper-imenter’s ability. Then x is deployedor varied and the change in y is mea-sured as precisely as possible. If afterx changes we see y change, or we see a

Page 26: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 62

change of a certain magnitude and thelike, we say x is indeed a cause of y (insome form).

But this judgment supposes we havecorrectly identified every possible causeor moderator of y or of x. Since in sci-ence we deal with the contingent, y willbe contingent. This means that even inspite of our certainty that x is a or is thecause of y, there always exists the possi-bility (however remote) that somethingunknown was responsible for the obser-vations, or that blocked or preventedx from using its powers to change y;[29, 87, 88]. This possibility is theoret-ical, not necessarily practical. In prac-tice we always limit the number of pos-sible causes x to a finite set. If we designan experiment to deduce if x = “grav-ity” caused the y = “pencil to drop”, wenaturally assume, given our backgroundknowledge about gravity and its effectson things like pencils, that the pencilwill drop because of gravity. Yet if thepencil does drop, it remains a theoreti-cal possibility that something else, per-haps a mysterious pencil-pulling ray ac-tivated by pranksters, caused the pencildrop and not gravity. There is no wayto prove this is not so. Yet we implic-itly (and rightly!) condition our judg-ment on the non-existence of this rayand on any other freakish cause. It isthat (proper) conditioning which is keyto our understanding of cause.

This discussion might seem irrele-vant, but it is not. It in fact containsthe seed of the proof that algorithmsautomated to discover cause of observ-ables must contain in advance at leastthe true cause of y. And so it mustbe that cause, or rather knowledge ofcause, is something that can only be hadin the mind. No algorithm can deter-mine what was a cause or was not, un-less that algorithm was first“told”which

are causes and which not. This is provedshortly. The point of this exercise is toexhort researchers from preaching withtoo much vigor and too much certaintyabout their results, especially in thoseinstances where a researcher claims amodel has backed up his claims. Allmodels only do what they were told;that a model fits data is not indepen-dent verification of the model’s truth.

A true cause x of y, given identicalconditions and where identical is usedin its strictest sense, will always leadto perfect correlations (not necessarilylinear, of course) of an observable. Analgorithm can certainly note this per-fect correlation, and it can be “told” tosay things like“If a perfect correlation isseen at least so-many times, a cause ex-ists.”But perfect correlations are not al-ways indicative of cause. Samples, eventhough thought large, can be small, andthe correlation spurious. The directionof causality can be backwards, whereit’s not x causing y, but y causing x.Third, and of even greater importance,removed measures might be causing theobservations: e.g. w is causing v andz which are in turn causing y and x,which is all we measure. These kinds ofremote-cause scenarios multiply. In thelast, if w is not measured, or v or z arenot, then the algorithm has no way toidentify cause. If they are measured andin the algorithm, it must be because thecause was already suspected or known.

4.4 Cause is in the Mind, not theData

Suppose you fed into the algorithma series of numbers, starting at 1, then2, and so on. The machine discovers therule that for any three of these numbersx, y, and z “If x = y and y = z, thenx = z. ” The rule is true for all thenumbers fed into the algorithm. But is

Page 27: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 63

it always true, i.e. true for numbers notyet fed into the algorithm? The algo-rithm cannot tell us—unless, again, itwere pre-programmed to announce thatafter so-many examples the universal istrue. By“universal”we mean the propo-sition holds for an infinity of naturalnumbers. This example is, of coursethe second of Peano’s axioms for math-ematics, believed by all (who considerit) to be true. But that is because hu-mans have the ability to extract thisuniversal from data, whereas an algo-rithm cannot. In this case, no algorithmcan ever consider an infinity of numbers;whereas, we can. Consider that theuniversal is not necessarily true in thiscase, because the algorithm may havebeen fed numbers from some processwhich after a point sees the sequencebecome intransitive, at which point therule breaks down. The algorithm can-not know what is beyond what it sees.The difference from the universal andthe eventually intransitive sequence isconditioning. When we judge the ax-iom true, we condition on the idea itapplies to numbers not tied to any pro-cess, except their progression. But thealgorithm cannot know this; it cannotknow anything. This is why it can befooled if fed limited sequences. Peoplecan be fooled, too, of course, but a per-son and not an algorithm would under-stand whether the input was part of acontingent process or was purposely thesequence of natural numbers.

Now that is an obscure and philo-sophical answer to the question whetheralgorithms can discover causes. It isalso contra to some who argue algo-rithms can indeed mimic human think-ing; perhaps not now, but eventually;see [83]. So here is simple proof thatcause is in the mind and not the data.A silly thought experiment: Pick some-

thing that happened. It doesn’t mat-ter what it is, as long as it happened.Something caused this thing to happen;which is to say, something actual turnedthe potential (of the thing to happen)to actuality. All four facets of causewere involved, of course. I will take,as my example, the death of Napoleon.One afternoon he was spry, sipping hisGrand cru, and planning his momentoussecond comeback, and the next morninghe was smelling like week-old Brie.

Next we want to design an algorithmto discover the cause of this thing (in allfour aspects of cause, or even just the ef-ficient cause). This can be a regression,machine learning routine, neural net,deep learning algorithm, artificial intel-ligence routine, anything you like. Pluginto this algorithm, or into a diagramin the computer, or into whatever de-vice you like, THE EVENT. Then press“GO” or “ACTIVATE” or whatever it isthat launches the algorithm into action.What will be the result?

Nothing, of course. This lifeless al-gorithm cannot discover cause, becauseit has left out “data”. There is nothingfor the algorithm to process: no data towork on. There are no “x” to tie to the“y”, i.e. the event. So which data shouldwe put in? We have to choose. Thereare an infinite number of measures (x)available, and any machine we designwill have finite capacity. We must dosome kind of winnowing to select onlycertain of these infinite x. Which? Howabout these (keep in mind my event):(x1) The other day I was given a smallbottle of gin in the shape of a Dutchhouse in delft blue. (x2) You weren’tsupposed to drink the gin, but I did.(x3) In my defense, I wasn’t told untilafter I drank it that I shouldn’t have.(x4) It wasn’t that good.

Now if these x seem absurd to you,

Page 28: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 64

you have proven that cause is in themind and not the data, that it is youwho are extracting cause from data andnot algorithms. For you have usedyour knowledge of cause to discern therecould not be any possible connection be-tween a novelty bottle of gin and thedeath of a tyrant two centuries previ-ously. We can’t let an algorithm figureout the cause if we do not first feed thealgorithm x which we believe or suspectare in the causal path of y, and tell itwhich measure of relation between an xand y is appropriate to use in judgmentand whether this measure has crossedthe admitted threshold. And even thisdoes solve the full causal path problem,where other measures may be causingthe x and y.

Again, there are an infinite numberof measures x. Everything that’s everhappened, in the order it happened, isdata. That’s a lot of data. How can anyalgorithm pick cause out of all that totell us the cause of any event? That tallorder is thus not only tall, but impos-sible, too, since everything that’s everhappened wasn’t, for the most part,measured. And even it if it was, no de-vice could store all this data or manip-ulate it.

We could argue that only data re-lated to y in some way should be in-put into the algorithm, perhaps the rela-tions discovered by previous algorithms.But related must mean those measureswhich are the cause of the event, orwhich are not the direct causes, but in-cidental ones, perhaps measures causedby the event itself, or measures thatcaused the cause of the event, and thatsort of thing. Those measures whichare in (we can call it) the causal path.Any previous algorithm used to winnowx down to a suitable subset of relateditems must itself have been told which x

of the infinite choices to start with. Andso on for any algorithms “upstream” ofours. There thus must come a pointwhere human intelligence, which has theability to extract universals like causefrom data, albeit imperfectly, comesinto play and does the choosing. Algo-rithms are thus only good for automat-ing the tedious tasks of computation.

The best any algorithm can do is tofind prominent correlations, which mayor may not be directly related to thecause itself (and some may be spurious),using whatever rules of “correlation” wepre-specify. These correlations will bebetter or worse depending on our un-derstanding of the cause and thereforeof what “data” we feed our algorithm.The only way we know these data arerelated to the cause, or are the cause,is because we have a power algorithmscan never have, which is the extractionof and understanding of universals. Towrap up: cause is hard, and we’re betteroff claiming nothing more than correla-tion in most instances.

4.5 Tests of Stationarity andCause

The notion of stationarity is ubiqui-tous in time series modeling, e.g. [18].There is often great concern that agiven series is not stationary, a con-cern which has given rise to a suiteof tests, like the unit-root or DickeyFuller, e.g. [30]. We need not explorethe mathematical details. The idea isthat the probability model for a non-stationary series for observable yt hasparameters, such as those representingcovariance, that change in time, andthat if this is so, ordinary methods ofestimation will fail when making state-ments about other parameters, such asthose for auto-regression. This is truemathematically, but not causally.

Page 29: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 65

Now all yt in a series are caused: itcannot be, in ordinary time series mod-els, that any yt−j for any j and t causedyt, in any efficient sense. For instance,last year’s GDP did not cause this year’sGDP to take the value it did. Theymay share common causes, of course,and they also may not. Since proba-bility is not ontic, i.e. does not existmaterially, it cannot be the series itselfthat is not stationary, but the causes un-derlying it are not constant. Cause canchange in all the ways indicated above:by form, material, efficient power, or fi-nal. We have simple proof that at leastone cause changed or a new one inter-vened whenever yt 6= yt−1. So when aseries is said not to be stationary, it onlymeans that the causes of the observablehave changed enough so that the distri-bution representing uncertainty in theobservable’s variables must be changedat a point or points.

Explanation is, as we saw, a modelgoal, and knowledge of cause is the bestexplanation, and this is so in time seriesas in any other data. There are thustwo goals in time series modeling: tosay when a thing like a “change point”(change in the nature of causes) oc-curred, and prediction. Both goals arepredictive.

If we seek to say when a changepoint happened, then we must posit amodel of change, which we treat predic-tively like any other model. It will is-sue predictions Pr(Change at t|DnY) =pt, t = 0, 1, . . . , T (we don’t need the Xhere). To pick which t was the culpritrequires having a decision rule, just aswe must have a decision rule to movefrom probability to point in any model.

Yet often we don’t care when thechange was, because our interest is inprediction. In that case, we don’thave to pick which t was the change

point, and we simply make predictionsof yt+k, k > 0 weighted (in the suit-able and obvious way) by every possiblechange point t. The change point is in-tegrated out like any parameter. In anycase, testing is not needed.

4.6 The End of Cause

There is much more to this discus-sion (see [11]), but I have taxed thereader too much already, with much ofthe discussion seemingly arcane. Thehope is that if the reader does not seethe arguments above as definitive proof,then he see there is at least good evi-dence that our traditional means of as-cribing cause are at the very least toocertain, if not often in downright error.

Consider, lastly, a typical (but fic-tional) news headline, drawn from ascientific statistical study: “New StudyReveals Those Who Read At Least FiveBooks A Year Make More Money.” Theimplication is, of course, that reading(at least five but not four books) causeshigher incomes. News outlets will havenone of the caveats we presume the con-scientious researchers will put into theirpaper; the media (and many govern-ment agencies) will assume cause hasbeen definitively proven. It is the faultof researchers everywhere for not cor-recting these exceedingly common over-statements.

Yet even if researchers admit to lim-itations, they will make the same mis-takes as the press. We are in thesame situation of the exposure and not-exposure example causing disease. Evenif it is true that reading at least five(and not four or three) books causeshigher incomes, it won’t always do so,yet in the Discussion section of theirpaper the researchers will have surelylaunched into great theories of how thereading caused the higher incomes, im-

Page 30: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 66

plying the cause happened to all readersand with the same force. Even if the dif-ferences between (what are called) read-ers and non-readers is small, success willbe claimed, when it is far from clearthe direction of the cause is consistent,whether a third cause intervened, andwhether the claimed cause is even realor reproducible.

Consider this more mundane head-line that might be generated by thereality-based, predictive method usingthe same data: “There Is A 6% ChanceThose Who Read At Least Five BooksA Year Will Make At Least $400 MoreAnnually, But Only If The FollowingConditions Hold.” This is both predic-tive (6%), and verifiable ($400+, the fol-lowing conditions). Yet that kind of an-nouncement, were it a university pressrelease, while being vastly more honest,is not the sort of information that willget a researcher’s (or the university’s)name into the paper.

5 TRUST BUT VERIFY

However many models are left inconsideration at the end of the model-ing process, unless those models werededuced from first principles (see [11]Chapter 8 for examples), there willor should be uncertainty whether themodel or models are of any use in real-ity. A tremendous weakness of hypothe-sis testing is that it certified, if you like,a model’s goodness by requiring onlythat it evince at least one wee p-value.This is an absurd situation when we re-call both how easy it is to produce weep-values, and that the vast majority ofmodels in use are ad hoc; regression be-ing the largest example.

5.1 The Intrusion of Reality

Scarcely any who use statisticalmodels ever ask does the model work?Not works in the sense that data can befit to it, but works in the sense that itcan make useful predictions of reality ofobservations never before seen or usedin any way. Does the model verify? Ver-ification is a strict test, a test models inphysics, chemistry, meteorology, and allengineering fields must pass to be con-sidered as viable. In those fields, mod-els are not just proposed, but they areproposed and tested. Would you strapyourself into brand new a flying car builton speculative theoretical principles butwas never tested in any way, except forhow good the model looked on a com-puter?

Some fields never verify their mod-els. They are content with hypothesistesting and the many opportunities fortheorizing (some might say “pontificat-ing”) that method provides. There isthus no real way to know how good themodels in these areas really are. Over-confidence must be the rule, however,else we would not have the replicationcrisis mentioned above.

Verification in economic data is notuncommon in time series models. Timeseries models are set in a naturally pre-dictive form, where predictions are (orshould be) a matter of course. Manyformal verification methods have ac-cordingly came from this research; e.g.[2]. Another fecund field is (readersmight be surprised to learn) meteorol-ogy. Weather and climate forecasts ap-pear with regularity and the demand foraccuracy is keen. Great strides in math-ematical verification methods have ac-cordingly arisen in these areas; see [103].

The process of verification in itsideal form is simplicity itself: (1) cre-ate using old observations the model or

Page 31: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 67

models, (2) make probabilistic predic-tions using them, (3) wait for new ob-servations to accrue, (4) and then scorethose models with respect to the deci-sions that were made using the predic-tions. This is exactly how you wouldassess driving a new flying car.

Scientists, economists, and other re-searchers are often impatient about step(3). New observations typically havenot arrived by the time papers mustbe published, and, as everybody knowswith absolute certainty, it really is pub-lish or perish. Ideally, researcherswould wait until they have accumulatedenough new, never-before-used-in-any-way observations so that they couldprove their proposed models have skillor are useful. The rejoinder to this isthat requiring actual verification wouldslow research down. But this is a fal-lacy, because it assumes what it seeksto prove; namely, that the new researchis worthy. The solution, which oughtto please those in need of publications,is two-fold: use verification methods toestimate model goodness using the olddata, which itself is a prediction, andthen when new observations finally dobecome available, perform actual verifi-cation (and write a second paper aboutit). Of course, this last step might toooften lead to disappointment as it canreveal depressing conclusions for thosewho loved their models too well.

The point about the observationsused in verification having never beenused in any way cannot be under-stressed. Many methods like cross val-idation use so-called verification datasets to estimate model goodness. Theproblem is that the temptation to tweakthe original model so that it performsbetter on the verification set is toostrong for most to resist. I know of noreferences to support this opinion, but

having felt the temptation myself (andgiven in to it), I am sure it is not uncom-mon. Yet when this is done it in essenceunites the training and validation datasets so that they are really one, and wedo not have a true test of model good-ness in the wild, so to speak.

Yet we do have to have some ideaof how good a model might be. It may,for instance, be expensive to wait fornew observations, or those observationsmay depend on the final model chosen.So it is not undesirable to have an esti-mate of future performance. This re-quires two elements: a score or mea-sure of goodness applied to old obser-vations, and a new model of how thatscore of measure will reproduce in newobservations. As for scores and mea-sures, there are many: years of researchhas left us well stocked with tools forassessing model predictive performance;e.g. [41, 16, 73, 74, 85, 55, 15]. Asketch of those follows presently. But itis an open question, in many situations,how well scores and measures predict fu-ture performance, yet another area wideopen for research. In order to do thiswell, we not only need skillful modelsof observables Y, but also of the mea-sures X, since all predictions are condi-tional on those X. The possibilities herefor new work are nearly endless.

5.2 Verification Scores

Here is one of many examples ofa verification measure, the continuousranked probability score (CRPS), whichis quite general and has been well inves-tigated, e.g. [40, 55]. We imagine theCRPS to apply to old observations herefor use in model fit, but it works equallywell scoring new ones.

Let Fi(s) = Pr(Y < s|XiDnM), i.e.a probabilistic prediction of our modelfor past observation Xi. Here we let s

Page 32: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 68

vary, so that the forecast or predictionis a function of s, but s could be fixed,too. Let Yi be the i-th observed valueof Y. Then

CRPS(F,Y) =∑i

(Fi − I{s ≥ Yi})2

(18)where I is the indicator function. Thescore essentially is a distance betweenthe cumulative distribution of the pre-diction and the cumulative distributionof the observation (a step function atYi). A perfect forecast or prediction isitself a step function at the eventuallyobserved value of Yi, in which case theCRPS at that point is 0. Lower scores inverification measures are better (somescores invert this). The “continuous”part of the name is because (18) canbe converted to continuity in the ob-vious way; see below for an example.If F is not analytic, numerical approx-imations to CRPS would have to suf-fice, though these are easy to compute.When Fi = pi, i.e. a single number,which happens when Y is dichotomous,the CRPS is called the Brier Score.

Expected scores are amenable to de-composition, showing the constituentcomponents of performance; e.g. [17].One is usually related to the inher-ent variability of the observable, whichtranslates into an expected non-zerominimum of a given score (for a certainmodel form); for a simple example ofthe Brier scores, see [72]. This expectedminimum phenomenon is demonstratedbelow in the continuing example. Modelgoodness is not simply captured by onenumber, as with a score, but in exam-ining calibration, reliability, and sharp-ness. Calibration is of three dimen-sions: calibration in probability, whichis when the model predictions convergeto the prediction-conditional relativefrequency of the observables, calibration

in exceedance and calibration in aver-age; these are all mathematically de-fined in [41]. There is not the space hereto discuss all aspects of scoring, nor in-deed to give an example of validationin all its glory. It is much more reveal-ing and useful than the usual practice ofexamining model residuals, for a reasonthat will be clear in a moment.

CRPS is a proper score and it is sen-sitive to distance, meaning that obser-vations closer to model predictions scorebetter. Proper scores are defined con-ditional (as are all forecasts) on X, Dn

and M; see [91] for a complete theoreti-cal treatment of how scores fit into de-cision analysis. Given these, the properprobability is F , but other probabilitiescould be announced by scheming mod-elers; say they assert G 6= F for at leastsome s, where G is calculated condi-tional on tacit or hidden premises notpart of M. Propriety in a score is when∑

i

S(Gi,Yi)Fi ≥∑i

S(Fi,Yi)Fi.

(19)In other words, given a proper score themodeler does best when announcing thefull uncertainty implied by the modeland in not saying anything else. Pro-priety is a modest requirement, yet itis often violated. The popular scores

RMSE, i.e.√∑

i(Fi − Yi)2/n, mean

absolute deviation, i.e.∑

i |Fi − Yi|/n,

where Fi is some sort of point fore-cast derived as a function of Fi, are notproper. The idea is information is be-ing thrown away by making the fore-cast into a point, where we should in-stead be investigating it as a full prob-ability: a point is a decision, and not aprobability. Of course, points will arisewhen decisions must be made, but inthose situations actual and not theoret-ical cost-loss should be used to verify

Page 33: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 69

models, the same cost-loss that led tothe function that computed the points.Similarly, scores like R2 and so on arealso not proper.

If F (as an approximation) is a (cu-mulative) normal distribution, or can beapproximated as such, then the follow-ing formula may be used (from [40]):

CRPS(N(m, s2),Y) =

s

(1√π− Y−m

s

(2Φ

(Y−ms

)− 1

))− s

(2φ

(Y−ms

))(20)

where φ and Φ are the standard Normalprobability density function and cumu-lative distribution function, and m ands are known numbers given by our pre-diction. These could arise in regression,say, with conjugate priors. Estimates of(20) are easy to have in the obvious way.

CRPS, or any score, is calculatedper prediction. For a set of predic-tions, the sum or average score is usu-ally computed, though because averag-ing removes information it is best tokeep the set of scores and analyze those.CRPSi can be plotted by Yi or indeedany xi. Here is another area of researchabout how best to use the informationgiven in verification score.

Next we need the idea of skill. Skillis had when one model demonstratessuperiority over another, given by andconditional on some verification mea-sure. Skill is widely used in meteorol-ogy, for example, where the models be-ing compared are often persistence andthe fancy new theoretical model pro-posed by a researcher. This is a highlyrelevant point because persistence is theforecast that essentially says “tomorrowwill look exactly like today”, where thatstatement is affixed with the appropri-ate uncertainty, of course. If the newtheoretical model cannot beat this sim-

ple, naive model, it has no business be-ing revealed to the public. Economistsmaking time series forecasts are in ex-actly the same situation. Whatevermodel they are proposing should atleast beat persistence. If it can’t, whyshould the model be trusted when itisn’t needed to make good predictions?

It is not only times series modelsthat benefit by computing skill. Itworks in any situation. For example,in a regression, where one model has pmeasures and another, say, has p + q.Even if a researcher is happy with hismodel with p measures, it should atleast be compared to one with none,i.e. where uncertainty in the observableis characterized by the distribution im-plied by the model with no measures. Inregression, this would be the model withonly the intercept. If the model withthe greater number of measures cannotbeat the one with fewer, the model withmore is at least suspect or has no skill.Because of the potential for over-fitting,it is again imperative to do real verifi-cation on brand new observations. Howskill and information theoretic measuresare related is another open area of inves-tigation.

Skill scores K have the form:

K(F,G,Y) =S(G,Y)− S(F,Y)

S(G,Y), (21)

where F is the prediction from whatis thought to be the superior or morecomplex model and G the predictionfrom the inferior. Skill is always rela-tive. Since the minimum best score isS(F,Y) = 0, and given the normaliza-tion, a perfect skill score has K = 1.Skill exists if and only if K > 0, else itis absent. Skill like proper scores canbe computed as an average over a setof data, or individually over separatepoints. Plots of skill can be made inan analogous way. Receiver operating

Page 34: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 70

characteristic (ROC) curves, which arevery popular, are not to be preferred toskill curves since these do not answerquestions of usefulness in a natural way;see [16] for details.

We should insist that no modelshould be published without details ofhow it has been verified. If it is hasnot been verified with never-before-seenobservations, this should be admitted.Estimates of how the model will scorein future observations should be given.And skill must be demonstrated, even ifthis is only with respect to the simplestpossible competitive model.

5.3 Example Continued

We now complete the housing priceexample started above. Fig. 2 showedall predictions based on the assumptionthat old X were likely to be seen in thefuture, which is surely not implausible.Individual predictions with their accom-panying observations can also be shown,as in Fig.2, which shows the four obser-vations of the data (picked at random),and with predictions assuming their Xare new.

These plots are in the right for-mat for computing the CRPS, whichis shown in Figs. 4 and 5. The firstshows the normalized (the actual valuesare not important, except relatively) in-dividual CRPS by the observed prices.Scores away from the middle prices areon average worse, which will not be asurprise to any user of regression mod-els, only here we have an exact way ofquantifying “better” and “worse.” Thereis also a distinct lowest value of CRPS.This is related to the inherent uncer-tainty in the predictions, conditional onthe model and old data, as mentionedabove. This part of the verification ismost useful in communicating essentiallimitations of the model. Perfect predic-

tions, we project, assuming the modeland CRPS will be used on genuinelynew data, are not possible. The vari-ability has a minimum, which is notlow. An open question is whether thislower bound can be estimated in futuredata from assuming the model and data;equation (20) implies the answer is atleast yes sometimes (it can be computedin expected value assuming M’s truth).

Now plots like Fig. 4 can madewith CRPS by the Y or X, and it canbe learned what exactly is driving goodand bad performance. This is not donehere. Next, in Fig. 5 the individualCRPS of both models, with and withoutnox, are compared. A one-to-one line isoverdrawn. It is not clear from examin-ing the plot by eye whether adding noxhas benefited us.

Finally, skill (21) is computed, com-paring the models with and withoutnox. A plot of individual skill scoresas related to nox is given in Fig. 6.A dashed red line at 0 indicates pointswhich do not have skill. Similar plots forthe other measures may also be made.

The full model does not do that well.There are many points at which the sim-pler model bests the more complex one,with the suggestion that for the highestvalues of nox the full model does worst.The overall average skill score was K= -0.011, indicating the more compli-cated model (with nox) does not haveskill over the less complicated model.This means, as described above, thatif the CRPS represents the actual cost-loss score of a decision maker using thismodel, the prediction is that in futuredata, the simpler model will outperformthe more complex one.

Whether this insufficiency in themodel is due to probability leakage, orthat the CRPS is not the best score inthis situation, remain to be seen.

Page 35: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 71

−20 0 20 40 60

0.0

0.2

0.4

0.6

0.8

1.0

Price = s

Pr(

Pric

e <

s |

XD

M)

−20 0 20 40 60

0.0

0.2

0.4

0.6

0.8

1.0

Price = s

Pr(

Pric

e <

s |

XD

M)

−20 0 20 40 60

0.0

0.2

0.4

0.6

0.8

1.0

Price = s

Pr(

Pric

e <

s |

XD

M)

−20 0 20 40 60

0.0

0.2

0.4

0.6

0.8

1.0

Price = s

Pr(

Pric

e <

s |

XD

M)

Fig. 3 The probability prediction of housing prices at four old X, assumed as new. Anempirical CDF of the eventual observation at each X is over-plotted.

●●

●●

● ●●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●●

● ●

● ●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●● ●

●●●

●● ●

●●

●● ●

● ●●

● ●

●●

●●

●●

●●

● ●

●●

●●

●●

●●

●●

● ●

●●

●●

● ●

●●

●●

●●

●●●

●●

●●

●●

●●

● ●

●●●

●●

●●

●● ●

● ● ● ●●

●●

● ●

● ●●●●●

●●

●●

●●

●●

●●

●●

● ●

● ●

● ●●

●●

●● ●

● ●

●●

●●●

●●

● ● ●●

● ●●

●●

10 20 30 40 50

0.00

0.05

0.10

0.15

0.20

0.25

0.30

0.35

Price ($1,000s)

CR

PS

(w

ith n

ox)

Fig. 4 The individual CRPS scores (normalized to 1) by the price of houses (in$1,000s). Lower scores are better. Not surprisingly, scores away from the middle pricesare on average worse.

We have thus moved from delight-ful results as indicated by p-values, tomore sobering results when testing themodel against reality—where we alsorecall this is only a guess of how themodel will actually perform on futuredata: nox is not useful. Since the possi-bility for over-fitting is always with us,it is the case that future skill measures

would likely be worse than those seen inthe old data.

6 THE FUTURE

As the example suggests, the fail-ure to generate exciting results mightexplain why historically the predictivemethod was never adopted. Reality is

Page 36: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 72

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

● ●

●●

● ●

●●

●●

●●

●●

●●

●●

●● ●

●● ●

●●●

●● ●

●●

●●●

●●●

● ●

●●

●●

●●

● ●

● ●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●● ●

●●

●●

●●●●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●● ●

● ●

●●

●●●

●●●●

●●

●●

0.10 0.15 0.20 0.25 0.30

0.10

0.15

0.20

0.25

0.30

CRPS (no nox)

CR

PS

(w

ith n

ox)

Fig. 5 The individual CRPS scores of the full model, with now, by the CRPS of themodel removing nox. A one-to-one line has been overdrawn. There does not seem to bemuch if any improvement in scores by adding nox to the model.

●●●

●●

●●●

●●●●●●●

●●●●●

●●

●●

●●

●●●

●●

●●

●●

●● ●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●●●

●●

●●●

●●

●●●

●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●●

●●●

●●

●●

●●

●●●●

●●●

●●

●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●●

●●●●● ●

●●

●●●

●●

●●

●●●

●●

●●●

●●

●●

●●●

●●

●●

●●

●●

●●●●●●

●●

●●●●

●●

●●

●●●

●●●

0.4 0.5 0.6 0.7 0.8

−0.

6−

0.4

−0.

20.

00.

20.

4

nox

Ski

ll

Fig. 6 The individual skill scores comparing models with the old X with and withoutnox, as related to nox. A dashed red at 0, indicating no skill, has been drawn.

a harsh judge. Yet above if it wasconsidered verification was judgmentalin-sample, imagine the shock when itwill be learned verification is downrightcruel out-of-sample, i.e. in new, neverbefore seen observations. The reality-based approach is therefore bound tolead to disappointment where beforethere was much joy. Such is often the

case in science when, as the saying goes,a beautiful theory meets ugly facts.

There should by now at least bemore suspicion among researchers thatmodels have truly identified cause. Wehave seen the many confusing and dis-parate ideas about cause which arecommon in uncertainty models, a classwhich includes so-called artificial intelli-

Page 37: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 73

gence and machine learning algorithms.We don’t often recognize that nothingcan come out any algorithm which wasnot placed there, at the beginning, bythe algorithm creator. Algorithms areuseful for automating tedious processes,but coming to knowledge of cause re-quires a much deeper process of think-ing. Every scientist believes in confir-mation bias; it’s just they always believeit happens to the other guy.

Creators of models and algorithmsare one class, users are another. Thechance of over-certainty increases in useand interpretations of models in thislatter class because a user will not onaverage be as aware of the shortcom-ings, limitations, and intricacies of mod-els are creators are. All common experi-ence bears out that users of models aremore likely to ascribe cause to hypothe-ses than more careful creators of mod-els. The so-called replication crisis canin part be put down to non-careful useof models, in particular the unthinkinguse of hypothesis testing; e.g. [3, 99].

The situation is even worse thanit might seem, because beside the for-mal models considered here, there isanother, wider, and more influentialclass, which we might call media mod-els. There is little to no check on thewild extrapolations that appear in thepress (and taken up in civic life). I havea small collection of headlines report-

ing on medical papers, each contradict-ing the other, and all trumpeting thatcauses have been discovered (via test-ing); see [10]. One headline: “Bad newsfor chocoholics: Dark chocolate isn’t sohealthy for you after all,” particularlynot good, the story informs, for heartdisease. This was followed by anotherheadline three short months later in thesame paper saying“Eating dark choco-late is good for your heart.” Similar col-lections for economics studies could eas-ily and all too quickly be compiled.

It could be argued the ultimate re-sponsibility is on the people making thewild and over-sure claims. This holdssome truth, but the appalling frequencythat this sort of thing happens withoutany kind of corrections from authori-ties (like you, the reader) implies, to themedia, that what they are doing is notwrong.

Bland warnings cannot take theplace of outright proscriptions. Wemust ban the cause of at least some ofthe over-certainty. No more hypothe-sis testing. Models must be reportedin their predictive form, where anybody(in theory) can check the results, evenif they don’t have access to the originaldata. All models which have any claimto sincerity must be tested against real-ity, first in-sample, then out-of-sample.Reality must take precedence over the-ory.

References

[1] Amrhein, V., Korner-Nievergelt, F. and Roth, T. (2017). The Earth is flat(p > 0.05): Significance Thresholds and the Risis of Unreplicable Research.PeerJ, 5:e3544.

[2] Armstrong, J. S.(2007). Significance Testing Harms Progress in Forecasting(with discussion). International Journal of Forecasting, 23, 321-327.

[3] Begley, C. G. and Ioannidis, J. P. (2015). Reproducibility in Science: Improv-

Page 38: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 74

ing the Standard for Basic and Preclinical Research. Circulation Research,116, 116-126.

[4] Benjamin, D. J., Berger, J. O., Johannesson, M., Nosek, B. A., Wagenmak-ers, E. J., Berk, R. and et al (2018). Redefine Statistical Significance. Nat.Hum. Behav., 2, 6-10.

[5] Berger, J. O. and Selke, T. (1987). Testing a Point Null Hypothesis: TheIrreconcilability of P-values and Evidence. JASA, 33, 112-122.

[6] Bernardo, J. M. and Smith, A. F. M. (2000). Bayesian Theory . Wiley, NewYork.

[7] Breiman, L. (2001). Statistical Modeling: The Two Cultures. StatisticalScience, 16(3), 199-215.

[8] Briggs, W. M. (2006). Broccoli Reduces the Risk of Splenetic Fever!The use of Induction and Falsifiability in Statistics and Model Selection.arxiv.org/pdf/math.GM/0610859.

[9] Briggs, W. M. (2013). On Probability Leakage. arxiv.org/abs/1201.3611.

[10] Briggs, W. M. (2014). Common Statistical Fallacies. Journal of AmericanPhysicians and Surgeons, 19(2), 678-681.

[11] Briggs, W. M. (2016). Uncertainty: The Soul of Probability, Modeling &Statistics . Springer, New York.

[12] Briggs, W. M. (2017). The Substitute for P-values. JASA, 112, 897-898.

[13] Briggs, W. M. (2019). Everything Wrong with P-values under one Roof. InKreinovich, V., Thach, N. N., Trung, N. D. and Thanh, D. V. editors, Be-yond Traditional Probabilistic Methods in Economics, 22-44. Springer, NewYork.

[14] Briggs, W. M., Nguyen, H. T. and Trafimow, D. (2019). The Replacement forHypothesis Testing. In V. Kreinovich and S. Sriboonchitta, editors, StructuralChanges and Their Econometric Modeling, 3-17. Springer, New York.

[15] Briggs, W. M. and Ruppert, D. (2005). Assessing the Skill of yes/no Predic-tions. Biometrics, 61(3),799-807.

[16] Briggs, W. M. and Zaretzki, R. A. (2008). The Skill Plot: A GraphicalTechnique for Evaluating Continuous Diagnostic Tests. Biometrics, 64, 250-263. (with discussion).

[17] Brocker, J. (2009). Reliability, Sufficiency, and the Decomposition of ProperScores. Quarterly Journal of the Royal Meteorological Society, 135, 1512-1519.

[18] Brockwell, P. J. and Davis, R. A. (1991). Time Series: Theory and Methods.Springer, New York, NY.

Page 39: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 75

[19] Camerer, C. F., Dreber, A., Forsell, E., Ho, T. H., Johannesson, M., Kirch-ler, M., Almenberg, J. and Altmejd, A. (2016). Evaluating Replicability ofLaboratory Experiments in Economics. Science, 351, 1433-1436.

[20] Campbell, S. and Franklin, J. (2004). Randomness and Induction. Synthese,138, 79-99.

[21] Carroll, R. J, Ruppert, D., Stefansky, L. A. and Crainiceanu, C. M. (2006).Easurement Error in Nonlinear Models: A Modern Perspective. Chapmanand Hall, London.

[22] Chang, A. C. and Li, P. (2015). Is Economics Research Replicable? SixtyPublished papers from Thirteen Journals Say ‘Usually Not’. Technical Report2015-083, Finance and Economics Discussion Series, Divisions of Research &Statistics and Monetary Affairs, Federal Reserve Board, Washington, D.C.

[23] Cintula, P., Fermuller, C. G. and Noguera, C. (2017). Fuzzy Logic. In Ed-ward N. Zalta, editor, The Stanford Encyclopedia of Philosophy. MetaphysicsResearch Lab, Stanford University.

[24] Clarke, B. S. and Clarke, J. L. (2018). Predictive Statistics. CambridgeUniversity Press.

[25] Cohen, J. (1995). The Earth is Round (p < .05). American Psychologist, 49,997-1003.

[26] Colquhoun, D. (1979). An Investigation of the False Discovery Rate and theMisinterpretation of P-values. Royal Society Open Science, 1, 1-16.

[27] Cook, T. D. and Campbell, D. T. (1979). Quasi-Experimentation: Design &Analysis Issues for Field Settings. Houghton Mifflin, Boston.

[28] Crosby, A. W. (1997). The Measure of Reality: Quantification and WesternSociety, 1250-1600. Cambrigde University Press.

[29] Duhem, P. (1954). The Aim and Structure of Physical Theory. PrincetonUniversity Press.

[30] Einstein, A., Podolsky, P. and Rosen, N. (2001). Testing for Unit Roots:What Should Students be Taught? Journal of Economic Education, 32,137-146.

[31] Falcon, A. (2015). Aristotle on Causality. In Edward N. Zalta, editor, TheStanford Encyclopedia of Philosophy. Metaphysics Research Lab, StanfordUniversity.

[32] Feser, E. (2010). The Last Superstition: A Refutation of the New Atheism.St. Augustines Press, South Bend, Indiana.

[33] Feser, E. (2013). Kripke, Ross, and the Immaterial Aspects of Thought.American Catholic Philosophical Quarterly, 87, 1-32.

Page 40: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 76

[34] Feser, E. (2014). Scholastic Metaphysics: A Contemporary Introduction.Editions Scholasticae, Neunkirchen-Seelscheid, Germany.

[35] Franklin, J. (2014). An Aristotelian Realist Philosophy of Mathematics:Mathematics as the Science of Quantity and Structure. Palgrave Macmil-lan, New York.

[36] Freedman, D. (2005). Linear Statistical Models for Causation: A CriticalReview. In Brian S. Everitt and David Howell, editors, Encyclopedia ofStatistics in Behavioral Science, 2-13.Wiley, New York.

[37] Geisser, S. (1993). Predictive Inference: An Introduction. Chapman & Hall,New York.

[38] Gelman, A. (2000). Diagnostic Checks for Discrete Data Regression Modelsusing Posterior Predictive Simulations. Appl. Statistics, 49(2), 247-268.

[39] Gigerenzer, G. (2004). Mindless Statistics. The Journal of Socio-Economics,33, 587-606.

[40] Gneiting, T. and Raftery, A. E. (2007). Strictly Proper Scoring Rules, Pre-diction, and Estimation. JASA, 102, 359-378.

[41] Gneiting, T., Raftery, A. E. and Balabdaoui, F. (2007). Probabilistic Fore-casts, Calibration and Sharpness. Journal of the Royal Statistical SocietySeries B: Statistical Methodology, 69, 243-268.

[42] Goodman, S. N. (2001). Of P-values and Bayes: A Modest Proposal. Epi-demiology, 12, 295-297.

[43] Greenland, S. (2000). Causal Analysis in the Health Sciences. JASA, 95,286-289.

[44] Greenland, S. (2017). The Need for Cognitive Science in Methodology. Am.J. Epidemiol., 186, 639-645.

[45] Greenland, S. (2018). Valid p-values Behave Exactly as they Should: SomeMisleading Criticisms of P-values and Their Resolution with S-values. Am.Statistician..

[46] Greenland, S., Senn, S. J., Rothman, K. J., Carlin, J. B., Poole, C., Good-man, S. N. & Altman, D. G. (2016). Statistical tests, P values, ConfidenceIntervals, and Power: A Guide to Misinterpretations. European Journal ofEpidemiology, 31(4), 337-350.

[47] Greenwald, A. G. (1975). Consequences of Prejudice Against the Null Hy-pothesis. Psychological Bulletin, 82(1), 1-20.

[48] Groarke, L. (2009). An Aristotelian Account of Induction. Mcgill QueensUniversity Press, Montreal.

Page 41: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 77

[49] Haber, N., Smith, E. R., Moscoe, E., Andrews, K., Audy, R., Bell, W.,Brennan, A. T., Breskin A, Kane, J. C., Karra, M., McClure, E. S. andSuarez, E. A. (2018). Causal Language and Strength of Inference in Academicand Media Articles Shared in Social Media (claims): A Systematic Review.PLOS One.

[50] Hajek, A. (1997). Mises Redux - Redux: Fifteen Arguments Against FiniteFrequentism. Erkenntnis, 45, 209-227.

[51] Hajek, A. (2007). A Philosopher’s Guide to Probability. In Uncertainty:Multi-disciplinary Perspectives on Risk, Earthscan.

[52] Hajek, A. (2009). Fifteen Arguments Against Hypothetical Frequentism.Erkenntnis, 70, 211-235.

[53] Harrell, F. (2018). A Litany of Problems with P-values, Aug 2018.

[54] Harrison, D. and Rubinfeld, D. L. (1978). Hedonic Prices and the Demandfor Clean Air. Journal of Environmental Economics and Management, 5,81-102.

[55] Hersbach, H. (2000). Decompostion of the Continuous Ranked ProbabilityScore for Ensemble Prediction Systems. Weather and Forecasting, 15, 559-570.

[56] Hitchcock, C. (2018). Probabilistic Causation. In Edward N. Zalta, editor,The Stanford Encyclopedia of Philosophy. Metaphysics Research Lab, Stan-ford University.

[57] Gale, R. P., Hochhaus, A. and Zhang, M. J. (2016). What is the (p-) valueof the P-value? Leukemia, 30, 1965-1967.

[58] Holmes, S. (2018). Statistical Proof? The Problem of Irreproducibility.Bulletin of the American Mathematical Society, 55, 31-55.

[59] Hubbard, R. and Lindsay, R. M. (2008). Why P values are not a Useful Mea-sure of Evidence in Statistical Significance Testing. Theory & Psychology,18, 69-88. Ion2005Ioannidis, J. P. A. (2005). Why Most Published ResearchFindings are False. PLoS Medicine, 2(8), e124.

[60] Ioannidis, J. P. A., Stanley, T. D. and Doucouliagos, H. (2017). The Powerof Bias in Economics Research. The Economic Journal, 127, F236-F265.

[61] Johnson, W. O. (1996). Modelling and Prediction: Honoring SeymourGeisser, Chapter Predictive Influence in the Log Normal Surival Model, 104-121. Springer.

[62] Johnson, W. O. and Geisser, S. (1982). A Predictive view of the Detec-tion and Characterization of Influence Observations in Regression Analysis.JASA, 78, 427-440.

Page 42: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 78

[63] Kastner, R. E., Kauffman, S. and Epperson, M. (2017). Taking Heisenberg’sPotentia Seriously. arXiv e-prints.

[64] Keynes, J. M. (2004). A Treatise on Probability. Dover Phoenix Editions,Mineola, NY.

[65] Konishi, S. and Kitagawa, G. (2007). Information Criteria and StatisticalModeling. Springer, New York.

[66] Lee, J. C., Johnson, W. O. and Zellner, A. (Eds.) (1996). Modelling andPrediction: Honoring Seymour Geisser. Springer, New York.

[67] Lord, F. M. (1967). A Paradox in the Interpretation of Group Comparisons.Psychological Bulletin, 66, 304-305.

[68] Mayr, E. (1992). The Idea of Teleology. Journal of the History of Ideas,,53(1), 117-135.

[69] McShane, B. B., Gal, D., Gelman, A., Robert, C. and Tackett, J. L. (2017).Abandon Statistical Significance. The American Statistician.

[70] Meng, X. L. (2008). Bayesian Analysis. Cyr E. M’Lan and Lawrence Josephand David B. Wolfson, 3(2), 269-296.

[71] Mulder, J. and Wagenmakers, E. J. (2016). Editor’s Introduction to the Spe-cial Issue: Bayes Factors for Testing Hypotheses in Psychological Research:Practical Relevance and New Developments. Journal of Mathematical Psy-chology, 72, 1-5.

[72] Murphy, A. H. (1973). A New Vector Partition of the Probability Score.Journal of Applied Meteorology, 12, 595-600.

[73] Murphy, A. H. (1987). Forecast Verification: Its Complexity and Dimension-ality. Monthly Weather Review, 119, 1590-1601.

[74] Murphy, A. H. and Winkler, R. L. (1987). A General Framework for ForecastVerification. Monthly Weather Review, 115, 1330-1338.

[75] Nagel, T. (2012). Mind & Cosmos: Why the Materialist Neo-DarwinianConception of Nature is Almost Certainly False. Oxford University Press,Oxford.

[76] Neyman, J. (1937). Outline of a Theory of Statistical Estimation Based onthe Classical Theory of Probability. Philosophical Transactions of the RoyalSociety of London A, 236, 333-380.

[77] Nguyen, H. T. (2016). On Evidence Measures of Support for Reasoning withIntegrated Uncertainty: A Lesson from the Ban of P-values in StatisticalInference. In Integrated Uncertainty in Knowledge Modelling and DecisionMaking, 3-15. Springer.

Page 43: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

Asian Journal of Economics and Banking (2019), 3(1), 37-80 79

[78] Nguyen, H. T., Sriboonchitta, S. and Thach, N. N. (2019). On Quan-tum Probability Calculus for Modeling Economic Decisions. In StructuralChanges and Their Econometric Modeling. Springer.

[79] Nguyen, H. T. and Walke, A. E. (1994)r. On Decision Making using BeliefFunctions. In Advances in the Dempster-Shafer Theory of Evidence, Wiley.

[80] Nosek, B. A., Alter, G., Banks, G. C. and Others (2015). Estimating theReproducibility of Psychological Science. Science, 349, 1422-1425.

[81] Nuzzo, R. (2015). How Scientists Fool Themselves - and How They CanStop. Nature, 526, 182-185.

[82] Oderberg, D. (2008). Real Essentialism. Routledge, London.

[83] Pearl, J. (2000). Causality: Models, Reasoning, and Inference. CambridgeUniversity Press, Cambridge.

[84] Peng, R. (2015). The Reproducibility Crisis in Science: A Statistical Coun-terattack. Significance, 12, 30-32.

[85] Pocernich, M. (2007). Verfication: Forecast Verification Utilities for R.

[86] Poole, D. (1989). Explanation and Prediction: An Architecture for Defaultand Abductive Reasoning. Computational Intelligence, 5(2), 97-110.

[87] Quine, W. V. (1951). Two Dogmas of Empiricism. Philosophical Review, 60,20-43.

[88] Quine, W. V. (1953). Two Dogmas of Empiricism. Harper and Row, HarperTorchbooks, Evanston, Il.

[89] Russell, B. (1920). Introduction to Mathematical Philosophy. George Allen& Unwin, London.

[90] Sanders, G. (2018). An Aristotelian Approach to Quantum Mechanics.

[91] Schervish, M. (1989). A General Method for Comparing Probability Asses-sors. Annals of Statistics, 17, 1856-1879.

[92] Shimony, A. (2013). Bell’s Theorem. The Stanford Encyclopedia of Philos-ophy.

[93] Shmueli, G. (2010). To Explain or to Predict? Statistical Science, 25, 289-310.

[94] Stove, D. (1982). Popper and After: Four Modern Irrationalists, PergamonPress, Oxford.

[95] Stove, D. (1983). The Rationality of Induction. Clarendon, Oxford.

[96] Trafimow, D. (2009). The Theory of Reasoned Action: A Case Study ofFalsification in Psychology. Theory & Psychology, 19, 501-518.

Page 44: Reality-Based Probability & Statistics: Solving the Eviden ... · Independent Researcher, New York, NY, United States Article Info Received: 29/01/2019 Accepted: 14/02/2019 Available

William M. Briggs/Reality-Based Probability & Statistics: Solving the Evidential Crisis 80

[97] Trafimow, D. (2016). The Probability of Simple Versus Complex CausalModels in Casual Analyses. Behavioral Research, 49, 739-746.

[98] Trafimow, D. (2017). Implications of an Initial Empirical Victory for theTruth of the Theory and Additional Empirical Victories. Philosophical Psy-chology, 30(4), 411-433.

[99] Trafimow, D. el. al (2018). Manipulating the Alpha Level cannot Cure Sig-nificance Testing. Frontiers in Psychology, 9, 699.

[100] Trafimow, D. and Marks, M. (2015). Editorial. Basic and Applied SocialPsychology, 37(1), 1-2.

[101] Vigen, T. (2018). Spurious Correlations.

[102] Wasserstein, R. L. (2016). The ASA’s Statement on P-values: Context,Process, and Purpose. American Statistician, 70, 129-132.

[103] Wilks, D. S. (2006). Statistical Methods in the Atmospheric Sciences. Aca-demic Press, New York.

[104] Williams, D. (1947). The Ground of Induction. Russell & Russell, New York.

[105] Woodfield, A. (1976). Teleology. Cambridge University Press, Cambridge.

[106] Ziliak S. T. and McCloskey, D. N. (2008). The Cult of Statistical Significance.University of Michigan Press, Ann Arbor.