Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t Kevin T. Kelly...
-
date post
21-Dec-2015 -
Category
Documents
-
view
221 -
download
2
Transcript of Ockham’s Razor: What it is, What it isn’t, How it works, and How it doesn’t Kevin T. Kelly...
Ockham’s Razor: Ockham’s Razor: What it is, What it is,
What it isn’t, What it isn’t, How it works, and How it works, and
How it doesn’tHow it doesn’t
Kevin T. KellyKevin T. KellyDepartment of PhilosophyDepartment of Philosophy
Carnegie Mellon UniversityCarnegie Mellon Universitywww.cmu.eduwww.cmu.edu
““Efficient Convergence Implies Ockham's Efficient Convergence Implies Ockham's Razor”,Razor”, Proceedings of Proceedings of the 2002 International the 2002 International Workshop on Computational Models of Workshop on Computational Models of Scientific Scientific Reasoning and ApplicationsReasoning and Applications, Las Vegas, , Las Vegas, USA, June USA, June 24-24- 27, 2002.27, 2002.(with C. Glymour) (with C. Glymour) “Why Probability Does Not “Why Probability Does Not Capture the Capture the Logic of Scientific Logic of Scientific Justification”,Justification”, C. Hitchcock, ed., C. Hitchcock, ed., Contemporary Contemporary Debates in the Philosophy of ScienceDebates in the Philosophy of Science, Oxford: , Oxford: Blackwell, 2004.Blackwell, 2004.““Justification as Truth-finding Efficiency: How Justification as Truth-finding Efficiency: How Ockham's Ockham's Razor Works”,Razor Works”, Minds and MachinesMinds and Machines 14: 14: 2004, pp. 485-505.2004, pp. 485-505.““Learning, Simplicity, Truth, and Learning, Simplicity, Truth, and Misinformation”,Misinformation”, The The Philosophy of Philosophy of InformationInformation, under review., under review.““Ockham's Razor, Efficiency, and the Infinite Ockham's Razor, Efficiency, and the Infinite Game ofGame of
Science”,Science”, proceedings, proceedings, Foundations of the Foundations of the Formal Sciences 2004: Formal Sciences 2004: Infinite Game TheoryInfinite Game Theory, , Springer, under review.Springer, under review.
Further ReadingFurther Reading
DilemmaDilemma
If you If you knowknow the truth is simple, the truth is simple,
then you then you don’t needdon’t need Ockham. Ockham.
T1 T2 T3 T4 T5SimpleSimple ComplexComplex
DilemmaDilemma
If you If you don’t knowdon’t know the truth is simple, the truth is simple,
then how couldthen how could a a fixedfixed simplicity biassimplicity bias help you if the truth is complex?help you if the truth is complex?
T1 T2 T3 T4 T5SimpleSimple ComplexComplex
PuzzlePuzzleA A fixedfixed bias is like a bias is like a brokenbroken thermometer. thermometer.
How could it possibly help you find How could it possibly help you find unknownunknown truth? truth?
Cold!
Wishful ThinkingWishful Thinking
Simple theories are nice if true:Simple theories are nice if true: TestabilityTestability UnityUnity Best explanationBest explanation Aesthetic appealAesthetic appeal Compress dataCompress data
So is believing that you are the So is believing that you are the emperor.emperor.
OverfittingOverfitting Maximum likelihood estimates based on Maximum likelihood estimates based on
overly complex theories can have greater overly complex theories can have greater predictive errorpredictive error (AIC, Cross-validation, etc.). (AIC, Cross-validation, etc.). Same is true even if you Same is true even if you know know the the true true model is model is
complexcomplex.. Doesn’t Doesn’t convergeconverge to true model. to true model. Depends on Depends on random datarandom data..
Thanks, but a simpler Thanks, but a simpler model still has lower model still has lower predictive error.predictive error.
The truth is The truth is complex.complex.-God--God- .
..
.
Messy worlds are Messy worlds are legionlegion
Tidy worlds are Tidy worlds are fewfew..
That is why the tidy worldsThat is why the tidy worlds
Are those most likely Are those most likely truetrue. (Carnap). (Carnap)
unity
Ignorance = KnowledgeIgnorance = Knowledge
Ignorance = KnowledgeIgnorance = Knowledge
1/3 1/3 1/3
Messy worlds are Messy worlds are legionlegion
Tidy worlds are Tidy worlds are fewfew..
That is why the tidy worldsThat is why the tidy worlds
Are those most likely Are those most likely truetrue. (Carnap). (Carnap)
Ignorance = KnowledgeIgnorance = Knowledge
2/6
1/6
2/6
1/6
Messy worlds are Messy worlds are legionlegion
Tidy worlds are Tidy worlds are fewfew..
That is why the tidy worldsThat is why the tidy worlds
Are those most likely Are those most likely truetrue. (Carnap). (Carnap)
Depends on NotationDepends on Notation
But mess depends onBut mess depends on coding coding,,
which Goodman noticed, too.which Goodman noticed, too.
The picture is The picture is invertedinverted if if
we we translatetranslate green to grue. green to grue.2/6
1/6
2/6
1/6 NotationIndicates truth?
Same for Algorithmic Same for Algorithmic ComplexityComplexity
Goodman’s problemGoodman’s problem works against every works against every fixedfixed simplicity ranking (independent of the processes by simplicity ranking (independent of the processes by which data are generated and coded prior to learning).which data are generated and coded prior to learning).
Extra problem:Extra problem: any pair-wise ranking of theories can any pair-wise ranking of theories can be be reversedreversed by choosing an by choosing an alternative computer alternative computer languagelanguage..
So how could simplicity help us find the true theory? So how could simplicity help us find the true theory?
NotationIndicates truth?
Just Beg the QuestionJust Beg the Question AssignAssign high prior probability to high prior probability to
simple theories.simple theories. Why Why shouldshould you? you? Preference for Preference for complexitycomplexity has the same has the same
“explanation”.“explanation”.
You presume simplicityTherefore you should presume simplicity!
Miracle ArgumentMiracle Argument
Simple data would be a Simple data would be a miraclemiracle if a if a complex theory were true (Bayes, complex theory were true (Bayes, BIC, Putnam). BIC, Putnam).
Begs the QuestionBegs the Question
S C
““Fairness” between theories Fairness” between theories
bias against complex worlds. bias against complex worlds.
Two Can Play That GameTwo Can Play That Game
S C
““Fairness” between Fairness” between worlds worlds
bias against simple bias against simple theory.theory.
ConvergenceConvergence
At least a simplicity bias At least a simplicity bias doesn’t doesn’t preventprevent convergenceconvergence to the truth to the truth (MDL, BIC, Bayes, SGS, etc.).(MDL, BIC, Bayes, SGS, etc.). Neither do Neither do otherother biases. biases. May as well recommend May as well recommend flat tiresflat tires since since
they can be fixed.they can be fixed.
OSI
MP L E
OCO
M P L EX
Does Ockham Have No Does Ockham Have No Frock?Frock?
Philosopher’s stone,Philosopher’s stone,Perpetual motion,Perpetual motion,Free lunchFree lunch
Ockham’s Razor???Ockham’s Razor???
Ash Heap of Ash Heap of HistoryHistory
. . .
What is Guidance?What is Guidance? Indication or trackingIndication or tracking
Too Too strongstrong FixedFixed bias can’t bias can’t indicateindicate anything anything
Convergence Convergence Too Too weakweak True of other biasesTrue of other biases
““Straightest” convergenceStraightest” convergence Just right?Just right?
C
C
S
S
CS
What Does She Say?What Does She Say?Turn around. Turn around. The freeway The freeway ramp is on ramp is on the left.the left.
You Have Better IdeasYou Have Better Ideas
Phooey!Phooey!The Sun was on the The Sun was on the right!right!
The Best Route Anywhere The Best Route Anywhere from Therefrom There
PittsburghPittsburgh NY, DCNY, DC
Told ya!Told ya!
The Freeway to the TruthThe Freeway to the Truth
ComplexComplex SimpleSimple
Fixed adviceFixed advice for all destinations for all destinations Disregarding it entails an Disregarding it entails an extra course extra course
reversalreversal……
Told ya!Told ya!
The Freeway to the TruthThe Freeway to the Truth
ComplexComplex SimpleSimple
……even if the advice points even if the advice points away fromaway from the goal!the goal!
Told ya!Told ya!
Ockham’s RazorOckham’s Razor
3 ?
If you answerIf you answer, answer with the , answer with the current countcurrent count..
AnalogyAnalogy MarblesMarbles = detectable “effects”. = detectable “effects”. Late appearanceLate appearance = difficulty of = difficulty of
detection.detection. CountCount = model (e.g., causal graph). = model (e.g., causal graph). Appearance timesAppearance times = free = free
parameters.parameters.
AnalogyAnalogy
U-turnU-turn = model = model revisionrevision (with (with content loss)content loss)
HighwayHighway = revision-efficient = revision-efficient truth-finding method.truth-finding method.
TTTT
The U-turn ArgumentThe U-turn Argument
Suppose you Suppose you converge to the truthconverge to the truth but but
violate Ockham’s razor violate Ockham’s razor along the way.along the way.
33
The U-turn ArgumentThe U-turn Argument
Where is that extra marble, anyway?Where is that extra marble, anyway?
33
The U-turn ArgumentThe U-turn Argument
If you never say 2 you’ll never If you never say 2 you’ll never converge to the truth….converge to the truth….
33
The U-turn ArgumentThe U-turn Argument
That’s it. You should have listened to That’s it. You should have listened to Ockham.Ockham.
33222222
The U-turn ArgumentThe U-turn Argument
Oops! Well, no method is infallible!Oops! Well, no method is infallible!
33222222
The U-turn ArgumentThe U-turn Argument
If you never say 3, you’ll never If you never say 3, you’ll never converge to the truth….converge to the truth….
33222222
The U-turn ArgumentThe U-turn Argument
Embarrassing to be back at that old Embarrassing to be back at that old theory, eh?theory, eh?
33222222 33
Ockham is NecessaryOckham is Necessary
If you If you converge to the truthconverge to the truth, ,
and and you you violate Ockham’s razorviolate Ockham’s razor
then then some convergent method some convergent method beats beats
your worst-case revision boundyour worst-case revision bound in in each answer in the subproblem each answer in the subproblem entered at the time of the violation.entered at the time of the violation.
Ockham is SufficientOckham is Sufficient
If you If you converge to the truthconverge to the truth, ,
and and you you never violate Ockham’s razornever violate Ockham’s razor
then then You You achieve the worst-case revision achieve the worst-case revision
boundbound of each convergent solution of each convergent solution in each answer in each in each answer in each subproblem.subproblem.
EfficiencyEfficiency
EfficiencyEfficiency = achievement of the = achievement of the best worst-case revision bound in best worst-case revision bound in each answer in each subproblem. each answer in each subproblem.
Ockham Efficiency Ockham Efficiency TheoremTheorem
Among the convergent methods…Among the convergent methods…
Ockham = Efficient!Ockham = Efficient!
Efficient
Inefficient
““Mixed” StrategiesMixed” Strategies
mixed strategy mixed strategy = chance of output = chance of output depends only on actual experience.depends only on actual experience.
convergence in probabilityconvergence in probability = chance = chance of producing true answer of producing true answer approaches 1 in the limit.approaches 1 in the limit.
efficiencyefficiency = achievement of best = achievement of best worst-case worst-case expectedexpected revision bound revision bound in each answer in each subproblem. in each answer in each subproblem.
Ockham Efficiency Ockham Efficiency TheoremTheorem Among the Among the mixedmixed methods that methods that
converge in probability…converge in probability…
Ockham = Efficient!Ockham = Efficient!
Efficient
Inefficient
Dominance and Dominance and “Support”“Support”
EveryEvery convergent method is convergent method is weakly weakly dominateddominated in revisions by a clone in revisions by a clone who says “?” until stage who says “?” until stage nn..
1.1. Convergence Convergence Must leap Must leap eventually.eventually.
2.2. EfficiencyEfficiency Only leap to Only leap to simplest.simplest.
3.3. Dominance Dominance Could always Could always wait longer.wait longer. Can’t wait forever!
Ockham Wish ListOckham Wish List General definitionGeneral definition of Ockham’s razor. of Ockham’s razor. Compare revisions even when Compare revisions even when not not
bounded within answersbounded within answers.. Prove theorem for Prove theorem for arbitrary empirical arbitrary empirical
problemsproblems..
Empirical ProblemsEmpirical Problems Problem = Problem = partition of a topological partition of a topological
space.space. Potential answersPotential answers = partition cells. = partition cells. EvidenceEvidence = open (verifiable) = open (verifiable)
propositions.propositions.Example: SymmetryExample: Symmetry
Example: Parameter Example: Parameter FreeingFreeing
Euclidean topology.Euclidean topology. Say which parameters are zero.Say which parameters are zero. Evidence = open neighborhood. Evidence = open neighborhood.
•Curve fitting
a2
a1
0
a1 = 0a2 = 0
a1 > 0a2 = 0
a1 > 0a2 > 0
a1 = 0a2 > 0
The PlayersThe Players
Scientist:Scientist: Produces an answer in response to Produces an answer in response to
current evidence.current evidence. Demon:Demon:
Chooses evidence in response to Chooses evidence in response to scientist’s choicesscientist’s choices
WinningWinning
Scientist wins…Scientist wins… by defaultby default if demon doesn’t present an if demon doesn’t present an
infinite nested sequence of basic open infinite nested sequence of basic open sets whose intersection is a singleton.sets whose intersection is a singleton.
else by meritelse by merit if scientist eventually if scientist eventually always produces the true answer for always produces the true answer for world selected by demon’s choices.world selected by demon’s choices.
Comparing RevisionsComparing Revisions
One answer sequence One answer sequence maps into maps into another iff another iff there is an order and answer-preserving there is an order and answer-preserving
map from the first to the second (? is map from the first to the second (? is wild).wild).
Then the revisions of first are Then the revisions of first are as as good asgood as those of the second. those of the second.? ? ? ? ? . . .
. . .
Comparing RevisionsComparing Revisions
The revisions of the first are The revisions of the first are strictly strictly betterbetter if, in addition, the latter if, in addition, the latter doesn’t map back into the former. doesn’t map back into the former.
? . . .
? ? ? ? ? . . .
Comparing MethodsComparing Methods
F F is is as good asas good as G G iffiff
each output sequence of each output sequence of FF is as good is as good as some output sequence of as some output sequence of G.G.
F Gas good as
Comparing MethodsComparing Methods
FF is is better thanbetter than GG iff iff
FF is as good as is as good as GG and and
GG is is not not as good as as good as FF
F Gnot as good as
Comparing MethodsComparing Methods
FF is is strongly better thanstrongly better than GG iff each iff each output sequence of output sequence of FF is strictly is strictly better than an output sequence of better than an output sequence of GG but …but …
strictly better than
Comparing MethodsComparing Methods
…… no output sequence of no output sequence of GG is as is as good as any of good as any of FF..
not as good as
TerminologyTerminology
Efficient solution: Efficient solution: as good as any as good as any solution in any subproblem.solution in any subproblem.
What Simplicity Isn’tWhat Simplicity Isn’t Syntactic length.Syntactic length. Data-compression (MDL).Data-compression (MDL). Computational ease.Computational ease. Social “entrenchment” (Goodman).Social “entrenchment” (Goodman). Number of free parameters (BIC, Number of free parameters (BIC,
AIC).AIC). Euclidean dimensionalityEuclidean dimensionality
Only by accident!!
Symptoms…
What Simplicity IsWhat Simplicity Is SimplerSimpler theories are compatible theories are compatible
with with deeper problems of inductiondeeper problems of induction..
Worst demon
Smaller demon
Problem of InductionProblem of Induction
No true information entails the true No true information entails the true answer.answer.
Happens in Happens in answer boundariesanswer boundaries..
Demonic PathsDemonic PathsA A demonic pathdemonic path from from ww is a is a
sequence of alternating answers sequence of alternating answers that a demon can force an that a demon can force an arbitrary convergent method arbitrary convergent method through starting from through starting from ww. .
0…1…2…3…4
Simplicity Simplicity DefinedDefinedThe The AA-sequences-sequences are the demonic are the demonic
sequences beginning with answer sequences beginning with answer AA..
AA is is as simpleas simple as as BB iff each iff each BB--sequence is as good as some sequence is as good as some AA--sequence. sequence. 2, 3
2, 3, 42, 3, 4, 5
<<<
33, 4
3, 4, 5.
. .
So 2 is simpler than 3!
Ockham AnswerOckham Answer An answer as simple as any other An answer as simple as any other
answer.answer. = number of observed particles.= number of observed particles.
. .
.
So 2 is Ockham!
2, …, n2, …, n, n+12, …, n, n+1, n+2
<<<
nn, n+1
n, n+1, n+2
Ockham LemmaOckham LemmaAA is Ockham iff is Ockham iff
for all demonic for all demonic p, p, ((AA**pp) ≤ some demonic sequence. ) ≤ some demonic sequence.
3
I can force you through
2but not
through 3,2.
So 3 isn’t Ockham
Ockham AnswerOckham AnswerE.g.: Only simplest curve E.g.: Only simplest curve
compatible with data is Ockham.compatible with data is Ockham.
a2
a1
0
Demonic sequence:
Non-demonic sequences:
General Efficiency General Efficiency TheoremTheorem
If the topology is metrizable and If the topology is metrizable and separable and the question is separable and the question is countable then: countable then:
Ockham = Efficient.Ockham = Efficient.
Proof: uses Martin’s Borel Proof: uses Martin’s Borel Determinacy theorem.Determinacy theorem.
Stacked ProblemsStacked ProblemsThere is an Ockham answer at There is an Ockham answer at
every stage.every stage.
0
1
2
3
. . .
1
Non-Ockham Non-Ockham Strongly Strongly WorseWorse
If the problem is a If the problem is a stackedstacked countable partition over a countable partition over a restricted Polish space:restricted Polish space:
Each Ockham solution is Each Ockham solution is stronglystrongly better than each non- better than each non-Ockham solution in the Ockham solution in the subproblem entered at the subproblem entered at the time of the violation.time of the violation.
Simplicity Simplicity Low Low DimensionDimension
Suppose God says the true Suppose God says the true parameter value is rational.parameter value is rational.
Simplicity Simplicity Low Low DimensionDimension
Topological dimension and Topological dimension and integration theory dissolve.integration theory dissolve.
Does Ockham?Does Ockham?
Simplicity Simplicity Low Low DimensionDimension
The proposed account survives in The proposed account survives in the preserved limit point the preserved limit point structure.structure.
Respect for SymmetryRespect for Symmetry If severalIf several simplest simplest alternatives alternatives
are available, are available, don’t break the don’t break the symmetry.symmetry.
•Count the marbles of each color.•You hear the first marble but don’t see it.•Why red rather than green?
?
Respect for SymmetryRespect for Symmetry Before the noise, (Before the noise, (00, , 00) is ) is
Ockham.Ockham. After the noise, After the noise, no answer is no answer is
OckhamOckham::
((00, , 11))((11, , 00))
DemonicDemonic Non-demonicNon-demonic
((00, , 11) () (11, , 00))((11, , 00) () (00, , 11))
?
Right!
((00, , 00))
Goodman’s RiddleGoodman’s Riddle Count Count oneiclesoneicles--- a oneicle is a particle at --- a oneicle is a particle at
any stage but one, when it is a non-any stage but one, when it is a non-particle. particle.
Oneicle tranlation is auto-Oneicle tranlation is auto-homeomorphism that homeomorphism that does not preserve does not preserve the problemthe problem. .
Unique Unique Ockham answer is current Ockham answer is current oneicle count.oneicle count.
Contradicts Contradicts unique Ockham answer in unique Ockham answer in particle counting. particle counting.
SupersymmetrySupersymmetry Say Say whenwhen each particle appears. each particle appears. Refines Refines counting problem.counting problem. EveryEvery auto-homeomorphism auto-homeomorphism
preserves problem.preserves problem.
No No answer is Ockham.answer is Ockham. No No solution is Ockhamsolution is Ockham.. No No method is efficient.method is efficient.
Dual SupersymmetryDual Supersymmetry Say only whether particle count is Say only whether particle count is even even
or oddor odd.. CoarsensCoarsens counting problem. counting problem. Particle/OneicleParticle/Oneicle auto-homeomorphism auto-homeomorphism
preserves problem.preserves problem.
Every Every answer is Ockham.answer is Ockham. Every Every solution is Ockhamsolution is Ockham.. Every Every solution is efficient.solution is efficient.
Broken SymmetryBroken Symmetry CountCount the even or just report the even or just report oddodd.. CoarsensCoarsens counting problem. counting problem. RefinesRefines the even/odd problem. the even/odd problem. Unique Unique Ockham answer at each Ockham answer at each
stagestage.. Exactly Exactly Ockham solutions are Ockham solutions are
efficient.efficient.
Simplicity Under Simplicity Under RefinementRefinement
Particle countingOneicle counting Twoicle counting
Particle countingor odd particles
Oneicle countingor odd oneicles
Twoicle countingor odd twoicles
Time of particle appearance
Even/odd
Broken symmetryUnique Ockham answer
SupersymmetryNo answer is Ockham
Dual supersymmetryBoth answers are Ockham
Proposed Theory is RightProposed Theory is Right ObjectiveObjective efficiency is grounded in efficiency is grounded in
problems.problems. Symmetries in the preceding problems Symmetries in the preceding problems
would would wash outwash out stronger simplicity stronger simplicity distinctions. distinctions.
Hence, such distinctions would amount Hence, such distinctions would amount to mere to mere conventionsconventions (like coordinate (like coordinate axes) that couldn’t have anything to do axes) that couldn’t have anything to do with objective efficiency. with objective efficiency.
Furthermore…Furthermore… If Ockham’s razor is forced to choose If Ockham’s razor is forced to choose
in the supersymmetrical problems in the supersymmetrical problems then either:then either:
following Ockham’s razor following Ockham’s razor increases increases revisions revisions in some counting problems in some counting problems OrOr
Ockham’s razor leads to Ockham’s razor leads to contradictionscontradictions as a problem is coarsened or refined. as a problem is coarsened or refined.
What Ockham’s Razor IsWhat Ockham’s Razor Is
““Only output Ockham answers”Only output Ockham answers” Ockham answerOckham answer = a topological = a topological
invariant of the empirical problem invariant of the empirical problem addressed.addressed.
What it Isn’tWhat it Isn’t
preference for preference for brevity, brevity, computational ease, computational ease, entrenchment, entrenchment, past success, past success, Kolmogorov complexity, Kolmogorov complexity, dimensionality, etc….dimensionality, etc….
How it WorksHow it Works
Ockham’s razor is necessary for Ockham’s razor is necessary for mininizing revisionsmininizing revisions prior to prior to convergence to the truth.convergence to the truth.
How it Doesn’tHow it Doesn’t
No possible methodNo possible method could: could: Point at the truth;Point at the truth; Indicate the truth;Indicate the truth; Bound the probability of error;Bound the probability of error; Bound the number of future revisions.Bound the number of future revisions.
Spooky OckhamSpooky Ockham
Science Science without support or safety without support or safety netsnets..
Spooky OckhamSpooky Ockham
Science Science without support or safety without support or safety netsnets..
Spooky OckhamSpooky Ockham
Science Science without support or safety without support or safety netsnets..
Spooky OckhamSpooky Ockham
Science Science without support or safety without support or safety netsnets..
““Mixed” StrategiesMixed” Strategies
mixed strategy mixed strategy = chance of = chance of output depends only on actual output depends only on actual experience.experience.
Pe(M = H at n) = Pe|n(M = H at n).
e
Stochastic CaseStochastic Case
Ockham = Ockham =
at each stage, you produce a non-at each stage, you produce a non-Ockham answer with Ockham answer with prob = 0prob = 0. .
EfficiencyEfficiency = =
achievement of the best worst-case achievement of the best worst-case expected expected revision bound in each revision bound in each answer in each subproblem over all answer in each subproblem over all methods that converge to the truth methods that converge to the truth in probabilityin probability. .
Stochastic Efficiency Stochastic Efficiency TheoremTheorem
Among the stochastic methods Among the stochastic methods that converge in probability, that converge in probability, Ockham = Efficient!Ockham = Efficient!
Efficient
Inefficient
Stochastic MethodsStochastic Methods
Your chance of producing an answer is Your chance of producing an answer is a a function of observations made so farfunction of observations made so far. .
22
Urn selected in light of observations.
p
Stochastic U-turn Stochastic U-turn ArgumentArgument
Suppose you Suppose you converge converge in probabilityin probability to the truthto the truth but produce a non-Ockham but produce a non-Ockham answer with prob > 0.answer with prob > 0.
33r >
Choose small Choose small > 0. Consider answer > 0. Consider answer 4.4.
33
Stochastic U-turn Stochastic U-turn ArgumentArgument
r >
Stochastic U-turn Stochastic U-turn ArgumentArgument
By By convergenceconvergence in probability in probability to the to the truth:truth:
3322p > 1 - /3
r >
Stochastic U-turn Stochastic U-turn ArgumentArgument Since Since can be chosen arbitrarily small, can be chosen arbitrarily small,
sup prob of ≥ 3 revisions ≥ sup prob of ≥ 3 revisions ≥ rr.. sup prob of ≥ 2 revisions =1sup prob of ≥ 2 revisions =1
33r >
22p> 1-/333p > 1-/344p > 1-/3
Stochastic U-turn Stochastic U-turn ArgumentArgument So sup Exp revisions is ≥ So sup Exp revisions is ≥ 2 + 32 + 3rr..
But for Ockham = But for Ockham = 22. .
33r >
Subproblem
22p> 1-/333p > 1-/344p > 1-/3
The Statistical Puzzle of The Statistical Puzzle of SimplicitySimplicity
Assume:Assume: Normal distribution, Normal distribution, , , ≥ 0. ≥ 0. Question:Question: = 0 or = 0 or > 0 ?> 0 ? Intuition: Intuition: = 0 is simpler than = 0 is simpler than > 0 .> 0 .
= 0 mean
AnalogyAnalogy
Marbles: Marbles: potentially small “effects”potentially small “effects” Time:Time: sample sizesample size Simplicity:Simplicity: fewer free parameters tied to fewer free parameters tied to
potential “effects”potential “effects” Counting:Counting: freeing parameters in a modelfreeing parameters in a model
U-turn “in Probability”U-turn “in Probability” Convergence in probability:Convergence in probability: chance chance
of producing true model goes to unityof producing true model goes to unity
Retraction in probability:Retraction in probability: chance of producing chance of producing a model drops from above a model drops from above to below 1 – to below 1 –
Sample size
1
0
Chance of producingtrue model
Chance of producingalternative model
= 0
sample mean
Suppose You (Probably) Suppose You (Probably) Choose a Model More Choose a Model More
Complex than the TruthComplex than the Truth
mean
zone for choosing > 0 > 0
Revision Counter: 0
> 0
= 0
sample mean
Eventually You Retract to Eventually You Retract to the Truth (In Probability)the Truth (In Probability)
mean
zone for choosing = 0 = 0
Revision Counter: 1
> 0
sample mean
So You (Probably) Output So You (Probably) Output an Overly Simple Model an Overly Simple Model
NearbyNearby
mean
zone for choosing = 0 = 0
Revision Counter: 1
> 0
sample mean
mean
zone for choosing > 0 > 0
Eventually You Retract to Eventually You Retract to the Truth (In Probability)the Truth (In Probability)
Revision Counter: 2
But Standard (Ockham) But Standard (Ockham) Testing Practice Requires Testing Practice Requires
Just One Retraction!Just One Retraction! = 0
sample mean
mean
zone for choosing = 0 = 0
Revision Counter: 0
In The Simplest World, No In The Simplest World, No RetractionsRetractions
= 0
sample mean
mean
zone for choosing = 0 = 0
Revision Counter: 0
= 0
sample mean
mean
zone for choosing = 0 = 0
Revision Counter: 0
In The Simplest World, No In The Simplest World, No RetractionsRetractions
In Remaining Worlds, In Remaining Worlds, at Most One Retractionat Most One Retraction
> 0
mean
zone for choosing > 0 > 0
Revision Counter: 0
> 0 mean
Revision Counter: 0
In Remaining Worlds, In Remaining Worlds, at Most One Retractionat Most One Retraction
zone for choosing > 0 > 0
> 0 mean
Revision Counter: 1
In Remaining Worlds, In Remaining Worlds, at Most One Retractionat Most One Retraction
zone for choosing > 0 > 0
So Ockham Beats All ViolatorsSo Ockham Beats All Violators Ockham:Ockham: at most one revision. at most one revision. Violator:Violator: at least two revisions in at least two revisions in
worst caseworst case
SummarySummary Standard practice is to “test” the point Standard practice is to “test” the point
hypothesis rather than the composite hypothesis rather than the composite alternative.alternative.
This amounts to favoring the “simple” This amounts to favoring the “simple” hypothesis a priori.hypothesis a priori.
It also minimizes revisions in probability!It also minimizes revisions in probability!
Two Dimensional Two Dimensional ExampleExample
Assume:Assume: independent bivariate normal independent bivariate normal distribution of unit variance.distribution of unit variance.
Question:Question: how manyhow many components of the components of the joint mean are zero?joint mean are zero?
Intuition: Intuition: more nonzeros = more more nonzeros = more complexcomplex
Puzzle:Puzzle: How does it help to favor How does it help to favor simplicity in less-than-simplest worlds?simplicity in less-than-simplest worlds?
A Real Model Selection A Real Model Selection MethodMethod
Bayes Information Criterion (BIC)Bayes Information Criterion (BIC) BIC(BIC(MM, sample) = , sample) =
- log(max prob that - log(max prob that MM can assign to can assign to sample) + sample) +
+ log(sample size) + log(sample size) model complexity model complexity ½. ½.
BIC method: choose BIC method: choose MM with least with least BIC score.BIC score.
Official BIC PropertyOfficial BIC Property
In the limit, minimizing BIC finds a In the limit, minimizing BIC finds a model with maximal conditional model with maximal conditional probability when the prior probability probability when the prior probability is flat over models and fairly flat over is flat over models and fairly flat over parameters within a model. parameters within a model.
But it is also revision-efficient.But it is also revision-efficient.
AIC in Simplest WorldAIC in Simplest World
nn = 2 = 2 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-2 -1 0 1 2 3
-2
-1
0
1
2
3
AIC in Simplest WorldAIC in Simplest World
nn = 100 = 100 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-0.4 -0.2 0 0.2 0.4
-0.4
-0.2
0
0.2
0.4
AIC in Simplest WorldAIC in Simplest World
nn = 4,000,000 = 4,000,000 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-0.004 -0.002 0 0.002 0.004
-0.004
-0.002
0
0.002
0.004
BIC in Simplest WorldBIC in Simplest World
nn = 2 = 2 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-2 -1 0 1 2 3
-2
-1
0
1
2
3
BIC in Simplest WorldBIC in Simplest World
nn = 100 = 100 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
BIC in Simplest WorldBIC in Simplest World
nn = 4,000,000 = 4,000,000 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-0.0075-0.005-0.0025 0 0.0025 0.005 0.0075
-0.0075
-0.005
-0.0025
0
0.0025
0.005
0.0075
BIC in Simplest WorldBIC in Simplest World
nn = 20,000,000 = 20,000,000 = (0, 0).= (0, 0).
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
-0.006 -0.004 -0.002 0 0.002 0.004 0.006
-0.006
-0.004
-0.002
0
0.002
0.004
0.006
Performance in Complex Performance in Complex WorldWorld
nn = 2 = 2 = (.05, .005).= (.05, .005).
Retractions Retractions = 0= 0
-2 -1 0 1 2 3
-2
-1
0
1
2
3
ComplexComplex
SimpleSimple
95%
Performance in Complex Performance in Complex WorldWorld
nn = 100 = 100 = (.05, .005).= (.05, .005).
-0.75 -0.5 -0.25 0 0.25 0.5 0.75 1
-0.75
-0.5
-0.25
0
0.25
0.5
0.75
1
Retractions Retractions = 0= 0
ComplexComplex
SimpleSimple
Performance in Complex Performance in Complex WorldWorld
nn = 30,000 = 30,000 = (.05, .005).= (.05, .005).
-0.1 -0.05 0 0.05 0.1
-0.1
-0.05
0
0.05
0.1 Retractions Retractions
= 1= 1
ComplexComplex
SimpleSimple
Performance in Complex Performance in Complex WorldWorld
nn = 4,000,000 (!) = 4,000,000 (!) = (.05, .005).= (.05, .005).
-0.04 -0.02 0 0.02 0.04
-0.04
-0.02
0
0.02
0.04 Retractions Retractions = 2= 2
ComplexComplex
SimpleSimple
QuestionQuestion Does the statistical retraction minimization Does the statistical retraction minimization
story extend to violations in less-than-simplest story extend to violations in less-than-simplest worlds?worlds?
Recall that the deterministic argument for Recall that the deterministic argument for higher retractions required the concept of higher retractions required the concept of minimizing retractions in each “subproblem”.minimizing retractions in each “subproblem”.
A “subproblem” is a proposition verified at a A “subproblem” is a proposition verified at a given time in a given world. given time in a given world.
Some analogue “in probability” is required. Some analogue “in probability” is required.
Subproblem.Subproblem. HH is an is an -subroblem in -subroblem in ww at at nn: :
There is a likelihood ratio test of {There is a likelihood ratio test of {ww} at } at significance < significance < such that this test has power such that this test has power < 1 - < 1 - at each world in at each world in HH. .
w
sample size = n
H
reject
worlds
accept
reject
Significance SchedulesSignificance Schedules A A significance schedulesignificance schedule ((..) is a monotone ) is a monotone
decreasing sequence of significance levels decreasing sequence of significance levels converging to zero that drop so slowly that converging to zero that drop so slowly that power can be increased monotonically with power can be increased monotonically with sample size.sample size.
aa((nn))aa((nn+1)+1)
nn+1+1
nn
Ockham Violation Ockham Violation InefficientInefficient
(X, Y)
Ockham violation:Probably say blue hypothesisat white world(p > )
Subproblem At sample size n
Ockham Violation Ockham Violation InefficientInefficient
(X, Y)
Probably say blue
Probably say white
Subproblem at time of violation
Ockham Violation Ockham Violation InefficientInefficient
(X, Y)
Probably say blue
Probably say white
Subproblem at time of violation
Ockham Violation Ockham Violation InefficientInefficient
(X, Y)
Probably say blue
Probably say white
Probably say blue
Subproblem at time of violation
LocalLocal Retraction Retraction EfficiencyEfficiency
(X, Y) Subproblem
Ockham does as well as best subproblem Ockham does as well as best subproblem performance in performance in some neighborhoodsome neighborhood of of ww..
At most one retraction
Two retractions
Ockham Violation Ockham Violation InefficientInefficient
(X, Y)Subproblem at time of violation
Note: Note: nono neighborhood around neighborhood around ww avoids extra retractions.avoids extra retractions.
w
Gonzo Ockham Gonzo Ockham InefficientInefficient
(X, Y) Subproblem
Gonzo = probably saying simplest Gonzo = probably saying simplest answer in entire subproblem answer in entire subproblem entered in simplest world.entered in simplest world.
BalanceBalance
Be Ockham (avoid complexity) Be Ockham (avoid complexity) Don’t be Gonzo Ockham (avoid Don’t be Gonzo Ockham (avoid
bad fit).bad fit).
Truth-directed:Truth-directed: sole aim is to find sole aim is to find true modeltrue model with minimal revisions! with minimal revisions!
No circles:No circles: totally worst-case; no totally worst-case; no prior bias toward simple worlds.prior bias toward simple worlds.