PolyChord: Next Generation Nested Samplingaws/polychord_garching.pdf · PolyChord: Next Generation...

Post on 18-Oct-2020

2 views 0 download

Transcript of PolyChord: Next Generation Nested Samplingaws/polychord_garching.pdf · PolyChord: Next Generation...

PolyChord: Next Generation Nested SamplingSampling, Parameter Estimation and Bayesian Model

Comparison

Will Handleywh260@cam.ac.uk

Supervisors: Anthony Lasenby & Mike HobsonAstrophysics DepartmentCavendish Laboratory

University of Cambridge

December 11, 2015

Parameter estimation & model comparison

Metropolis Hastings

Nested Sampling

PolyChord

Applications

Notation

I Data: D

I Model: M

I Parameters: Θ

I Likelihood: P(D|Θ,M) = L(Θ)

I Posterior: P(Θ|D,M) = P(Θ)

I Prior: P(Θ|M) = π(Θ)

I Evidence: P(D|M) = Z

Notation

I Data: D

I Model: M

I Parameters: Θ

I Likelihood: P(D|Θ,M) = L(Θ)

I Posterior: P(Θ|D,M) = P(Θ)

I Prior: P(Θ|M) = π(Θ)

I Evidence: P(D|M) = Z

Notation

I Data: D

I Model: M

I Parameters: Θ

I Likelihood: P(D|Θ,M) = L(Θ)

I Posterior: P(Θ|D,M) = P(Θ)

I Prior: P(Θ|M) = π(Θ)

I Evidence: P(D|M) = Z

Notation

I Data: D

I Model: M

I Parameters: Θ

I Likelihood: P(D|Θ,M) = L(Θ)

I Posterior: P(Θ|D,M) = P(Θ)

I Prior: P(Θ|M) = π(Θ)

I Evidence: P(D|M) = Z

Notation

I Data: D

I Model: M

I Parameters: Θ

I Likelihood: P(D|Θ,M) = L(Θ)

I Posterior: P(Θ|D,M) = P(Θ)

I Prior: P(Θ|M) = π(Θ)

I Evidence: P(D|M) = Z

Notation

I Data: D

I Model: M

I Parameters: Θ

I Likelihood: P(D|Θ,M) = L(Θ)

I Posterior: P(Θ|D,M) = P(Θ)

I Prior: P(Θ|M) = π(Θ)

I Evidence: P(D|M) = Z

Notation

I Data: D

I Model: M

I Parameters: Θ

I Likelihood: P(D|Θ,M) = L(Θ)

I Posterior: P(Θ|D,M) = P(Θ)

I Prior: P(Θ|M) = π(Θ)

I Evidence: P(D|M) = Z

Notation

I Data: D

I Model: M

I Parameters: Θ

I Likelihood: P(D|Θ,M) = L(Θ)

I Posterior: P(Θ|D,M) = P(Θ)

I Prior: P(Θ|M) = π(Θ)

I Evidence: P(D|M) = Z

Bayes’ theoremParameter estimation

What does the data tell us about the params Θ of our model M?

Objective: Update our prior information π(Θ) in light of data D.

π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)

Solution: Use the likelihood L via Bayes’ theorem:

P(Θ|D,M) =P(D|Θ,M)P(Θ|M)

P(D|M)

Posterior =Likelihood× Prior

Evidence

Bayes’ theoremParameter estimation

What does the data tell us about the params Θ of our model M?

Objective: Update our prior information π(Θ) in light of data D.

π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)

Solution: Use the likelihood L via Bayes’ theorem:

P(Θ|D,M) =P(D|Θ,M)P(Θ|M)

P(D|M)

Posterior =Likelihood× Prior

Evidence

Bayes’ theoremParameter estimation

What does the data tell us about the params Θ of our model M?

Objective: Update our prior information π(Θ) in light of data D.

π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)

Solution: Use the likelihood L via Bayes’ theorem:

P(Θ|D,M) =P(D|Θ,M)P(Θ|M)

P(D|M)

Posterior =Likelihood× Prior

Evidence

Bayes’ theoremParameter estimation

What does the data tell us about the params Θ of our model M?

Objective: Update our prior information π(Θ) in light of data D.

π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)

Solution: Use the likelihood L via Bayes’ theorem:

P(Θ|D,M) =P(D|Θ,M)P(Θ|M)

P(D|M)

Posterior =Likelihood× Prior

Evidence

Bayes’ theoremParameter estimation

What does the data tell us about the params Θ of our model M?

Objective: Update our prior information π(Θ) in light of data D.

π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)

Solution: Use the likelihood L via Bayes’ theorem:

P(Θ|D,M) =P(D|Θ,M)P(Θ|M)

P(D|M)

Posterior =Likelihood× Prior

Evidence

Bayes’ theoremParameter estimation

What does the data tell us about the params Θ of our model M?

Objective: Update our prior information π(Θ) in light of data D.

π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)

Solution: Use the likelihood L via Bayes’ theorem:

P(Θ|D,M) =P(D|Θ,M)P(Θ|M)

P(D|M)

Posterior =Likelihood× Prior

Evidence

Bayes’ theoremParameter estimation

What does the data tell us about the params Θ of our model M?

Objective: Update our prior information π(Θ) in light of data D.

π(Θ) = P(Θ|M)D−→ P(Θ|D,M) = P(Θ)

Solution: Use the likelihood L via Bayes’ theorem:

P(Θ|D,M) =P(D|Θ,M)P(Θ|M)

P(D|M)

Posterior =Likelihood× Prior

Evidence

Bayes’ theoremModel comparison

What does the data tell us about our model Mi in relation to othermodels {M1,M2, · · · }?

P(Mi )D−→ P(Mi |D)

P(Mi |D) =P(D|Mi )P(Mi )

P(D)

P(D|Mi ) = Zi = Evidence of Mi

Bayes’ theoremModel comparison

What does the data tell us about our model Mi in relation to othermodels {M1,M2, · · · }?

P(Mi )D−→ P(Mi |D)

P(Mi |D) =P(D|Mi )P(Mi )

P(D)

P(D|Mi ) = Zi = Evidence of Mi

Bayes’ theoremModel comparison

What does the data tell us about our model Mi in relation to othermodels {M1,M2, · · · }?

P(Mi )D−→ P(Mi |D)

P(Mi |D) =P(D|Mi )P(Mi )

P(D)

P(D|Mi ) = Zi = Evidence of Mi

Bayes’ theoremModel comparison

What does the data tell us about our model Mi in relation to othermodels {M1,M2, · · · }?

P(Mi )D−→ P(Mi |D)

P(Mi |D) =P(D|Mi )P(Mi )

P(D)

P(D|Mi ) = Zi = Evidence of Mi

Bayes’ theoremModel comparison

What does the data tell us about our model Mi in relation to othermodels {M1,M2, · · · }?

P(Mi )D−→ P(Mi |D)

P(Mi |D) =P(D|Mi )P(Mi )

P(D)

P(D|Mi ) = Zi = Evidence of Mi

Parameter estimation & model comparisonThe challenge

Parameter estimation: what does the data tell us about a model?(Computing posteriors)

Model comparison: what does the data tell us about all models?(Computing evidences)

Both of these are challenging things to compute.

I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)

I Nested sampling (NS) promises to solve both simultaneously.

Parameter estimation & model comparisonThe challenge

Parameter estimation: what does the data tell us about a model?(Computing posteriors)

Model comparison: what does the data tell us about all models?(Computing evidences)

Both of these are challenging things to compute.

I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)

I Nested sampling (NS) promises to solve both simultaneously.

Parameter estimation & model comparisonThe challenge

Parameter estimation: what does the data tell us about a model?(Computing posteriors)

Model comparison: what does the data tell us about all models?(Computing evidences)

Both of these are challenging things to compute.

I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)

I Nested sampling (NS) promises to solve both simultaneously.

Parameter estimation & model comparisonThe challenge

Parameter estimation: what does the data tell us about a model?(Computing posteriors)

Model comparison: what does the data tell us about all models?(Computing evidences)

Both of these are challenging things to compute.

I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)

I Nested sampling (NS) promises to solve both simultaneously.

Parameter estimation & model comparisonThe challenge

Parameter estimation: what does the data tell us about a model?(Computing posteriors)

Model comparison: what does the data tell us about all models?(Computing evidences)

Both of these are challenging things to compute.

I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)

I Nested sampling (NS) promises to solve both simultaneously.

Parameter estimation & model comparisonThe challenge

Parameter estimation: what does the data tell us about a model?(Computing posteriors)

Model comparison: what does the data tell us about all models?(Computing evidences)

Both of these are challenging things to compute.

I Markov-Chain Monte-Carlo (MCMC) can solve the first ofthese (kind of)

I Nested sampling (NS) promises to solve both simultaneously.

Parameter estimation & model comparisonWhy is it difficult?

1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.

2. Worse, you don’t knowwhere this region is.

I Describing an N-dimensional posterior fully is impossible.

I Project/marginalise into 2- or 3-dimensions at best

I Sampling the posterior is an excellent compression scheme.

Parameter estimation & model comparisonWhy is it difficult?

1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.

2. Worse, you don’t knowwhere this region is.

I Describing an N-dimensional posterior fully is impossible.

I Project/marginalise into 2- or 3-dimensions at best

I Sampling the posterior is an excellent compression scheme.

Parameter estimation & model comparisonWhy is it difficult?

1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.

2. Worse, you don’t knowwhere this region is.

I Describing an N-dimensional posterior fully is impossible.

I Project/marginalise into 2- or 3-dimensions at best

I Sampling the posterior is an excellent compression scheme.

Parameter estimation & model comparisonWhy is it difficult?

1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.

2. Worse, you don’t knowwhere this region is.

I Describing an N-dimensional posterior fully is impossible.

I Project/marginalise into 2- or 3-dimensions at best

I Sampling the posterior is an excellent compression scheme.

Parameter estimation & model comparisonWhy is it difficult?

1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.

2. Worse, you don’t knowwhere this region is.

I Describing an N-dimensional posterior fully is impossible.

I Project/marginalise into 2- or 3-dimensions at best

I Sampling the posterior is an excellent compression scheme.

Parameter estimation & model comparisonWhy is it difficult?

1. In high dimensions, posteriorP occupies a vanishinglysmall region of the prior π.

2. Worse, you don’t knowwhere this region is.

I Describing an N-dimensional posterior fully is impossible.

I Project/marginalise into 2- or 3-dimensions at best

I Sampling the posterior is an excellent compression scheme.

Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .

I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.

1. Pick random direction2. Choose step length3. If uphill, make step. . .4. . . . otherwise sometimes make step.

Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .

I Turn the N-dimensional problem into a one-dimensional one.

I Explore the space via a biased random walk.

1. Pick random direction2. Choose step length3. If uphill, make step. . .4. . . . otherwise sometimes make step.

Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .

I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.

1. Pick random direction2. Choose step length3. If uphill, make step. . .4. . . . otherwise sometimes make step.

Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .

I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.

1. Pick random direction

2. Choose step length3. If uphill, make step. . .4. . . . otherwise sometimes make step.

Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .

I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.

1. Pick random direction2. Choose step length

3. If uphill, make step. . .4. . . . otherwise sometimes make step.

Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .

I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.

1. Pick random direction2. Choose step length3. If uphill, make step. . .

4. . . . otherwise sometimes make step.

Markov-Chain Monte-Carlo (MCMC)Metropolis-Hastings, Gibbs, Hamiltonian. . .

I Turn the N-dimensional problem into a one-dimensional one.I Explore the space via a biased random walk.

1. Pick random direction2. Choose step length3. If uphill, make step. . .4. . . . otherwise sometimes make step.

MCMC in action

0.4 0.2 0.0 0.2 0.4µ

0.6

0.8

1.0

1.2

1.4

A

4 2 0 2 4position

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

sign

al

MCMC in action

0.4 0.2 0.0 0.2 0.4µ

0.6

0.8

1.0

1.2

1.4

A

4 2 0 2 4position

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

sign

al

When MCMC failsBurn in

0.4 0.2 0.0 0.2 0.4µ

0.6

0.8

1.0

1.2

1.4

A

4 2 0 2 4position

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

sign

al

When MCMC failsBurn in

0.4 0.2 0.0 0.2 0.4µ

0.6

0.8

1.0

1.2

1.4

A

4 2 0 2 4position

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

sign

al

When MCMC failsTuning the proposal distribution

0.4 0.2 0.0 0.2 0.4µ

0.6

0.8

1.0

1.2

1.4

A

4 2 0 2 4position

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

sign

al

When MCMC failsTuning the proposal distribution

0.4 0.2 0.0 0.2 0.4µ

0.6

0.8

1.0

1.2

1.4

A

4 2 0 2 4position

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

sign

al

When MCMC failsMultimodality

4 2 0 2 4µ

0.0

0.5

1.0

1.5

2.0

A

4 2 0 2 4position

0.0

0.5

1.0

1.5

2.0

sign

al

When MCMC failsMultimodality

4 2 0 2 4µ

0.0

0.5

1.0

1.5

2.0

A

4 2 0 2 4position

0.0

0.5

1.0

1.5

2.0

sign

al

When MCMC failsPhase transitions

L

X

When MCMC failsThe real reason. . .

I MCMC does not give you evidences!

Z = P(D|M)

=

∫P(D|Θ,M)P(Θ|M)dΘ

=

∫L(Θ)π(Θ)dΘ

= 〈L〉π

I MCMC fundamentally explores the posterior, and cannotaverage over the prior.

When MCMC failsThe real reason. . .

I MCMC does not give you evidences!

Z = P(D|M)

=

∫P(D|Θ,M)P(Θ|M)dΘ

=

∫L(Θ)π(Θ)dΘ

= 〈L〉π

I MCMC fundamentally explores the posterior, and cannotaverage over the prior.

When MCMC failsThe real reason. . .

I MCMC does not give you evidences!

Z = P(D|M)

=

∫P(D|Θ,M)P(Θ|M)dΘ

=

∫L(Θ)π(Θ)dΘ

= 〈L〉π

I MCMC fundamentally explores the posterior, and cannotaverage over the prior.

When MCMC failsThe real reason. . .

I MCMC does not give you evidences!

Z = P(D|M)

=

∫P(D|Θ,M)P(Θ|M)dΘ

=

∫L(Θ)π(Θ)dΘ

= 〈L〉π

I MCMC fundamentally explores the posterior, and cannotaverage over the prior.

When MCMC failsThe real reason. . .

I MCMC does not give you evidences!

Z = P(D|M)

=

∫P(D|Θ,M)P(Θ|M)dΘ

=

∫L(Θ)π(Θ)dΘ

= 〈L〉π

I MCMC fundamentally explores the posterior, and cannotaverage over the prior.

When MCMC failsThe real reason. . .

I MCMC does not give you evidences!

Z = P(D|M)

=

∫P(D|Θ,M)P(Θ|M)dΘ

=

∫L(Θ)π(Θ)dΘ

= 〈L〉π

I MCMC fundamentally explores the posterior, and cannotaverage over the prior.

When MCMC failsThe real reason. . .

I MCMC does not give you evidences!

Z = P(D|M)

=

∫P(D|Θ,M)P(Θ|M)dΘ

=

∫L(Θ)π(Θ)dΘ

= 〈L〉π

I MCMC fundamentally explores the posterior, and cannotaverage over the prior.

Nested SamplingJohn Skilling’s alternative to MCMC!

New procedure:Maintain a set S of n samples, which are sequentially updated:

S0: Generate n samples from the prior π.

Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood

Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.

Nested SamplingJohn Skilling’s alternative to MCMC!

New procedure:

Maintain a set S of n samples, which are sequentially updated:

S0: Generate n samples from the prior π.

Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood

Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.

Nested SamplingJohn Skilling’s alternative to MCMC!

New procedure:Maintain a set S of n samples, which are sequentially updated:

S0: Generate n samples from the prior π.

Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood

Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.

Nested SamplingJohn Skilling’s alternative to MCMC!

New procedure:Maintain a set S of n samples, which are sequentially updated:

S0: Generate n samples from the prior π.

Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood

Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.

Nested SamplingJohn Skilling’s alternative to MCMC!

New procedure:Maintain a set S of n samples, which are sequentially updated:

S0: Generate n samples from the prior π.

Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood

Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.

Nested SamplingJohn Skilling’s alternative to MCMC!

New procedure:Maintain a set S of n samples, which are sequentially updated:

S0: Generate n samples from the prior π.

Sn+1: Delete the lowest likelihood sample in Sn, and replaceit with a new sample with higher likelihood

Requires one to be able to sample from the prior, subject to a hardlikelihood constraint.

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingGraphical aid

Nested SamplingWhy bother?

I At each iteration, the likelihood contour will shrink in volumeby a factor of ≈ 1/n.

I Nested sampling zooms in to the peak of the posteriorexponentially.

I Nested sampling can be used to get evidences!

Nested SamplingWhy bother?

I At each iteration, the likelihood contour will shrink in volumeby a factor of ≈ 1/n.

I Nested sampling zooms in to the peak of the posteriorexponentially.

I Nested sampling can be used to get evidences!

Nested SamplingWhy bother?

I At each iteration, the likelihood contour will shrink in volumeby a factor of ≈ 1/n.

I Nested sampling zooms in to the peak of the posteriorexponentially.

I Nested sampling can be used to get evidences!

Nested SamplingWhy bother?

I At each iteration, the likelihood contour will shrink in volumeby a factor of ≈ 1/n.

I Nested sampling zooms in to the peak of the posteriorexponentially.

I Nested sampling can be used to get evidences!

Calculating evidences

I Transform to 1 dimensional integral π(θ)dθ = dX

Z =

∫L(θ)π(θ)dθ

=

∫L(X )dX

I X is the prior volume

X (L) =

L(θ)>Lπ(θ)dθ

I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.

Calculating evidences

I Transform to 1 dimensional integral π(θ)dθ = dX

Z =

∫L(θ)π(θ)dθ

=

∫L(X )dX

I X is the prior volume

X (L) =

L(θ)>Lπ(θ)dθ

I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.

Calculating evidences

I Transform to 1 dimensional integral

π(θ)dθ = dX

Z =

∫L(θ)π(θ)dθ

=

∫L(X )dX

I X is the prior volume

X (L) =

L(θ)>Lπ(θ)dθ

I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.

Calculating evidences

I Transform to 1 dimensional integral π(θ)dθ = dX

Z =

∫L(θ)π(θ)dθ

=

∫L(X )dX

I X is the prior volume

X (L) =

L(θ)>Lπ(θ)dθ

I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.

Calculating evidences

I Transform to 1 dimensional integral π(θ)dθ = dX

Z =

∫L(θ)π(θ)dθ =

∫L(X )dX

I X is the prior volume

X (L) =

L(θ)>Lπ(θ)dθ

I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.

Calculating evidences

I Transform to 1 dimensional integral π(θ)dθ = dX

Z =

∫L(θ)π(θ)dθ =

∫L(X )dX

I X is the prior volume

X (L) =

L(θ)>Lπ(θ)dθ

I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.

Calculating evidences

I Transform to 1 dimensional integral π(θ)dθ = dX

Z =

∫L(θ)π(θ)dθ =

∫L(X )dX

I X is the prior volume

X (L) =

L(θ)>Lπ(θ)dθ

I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.

Calculating evidences

I Transform to 1 dimensional integral π(θ)dθ = dX

Z =

∫L(θ)π(θ)dθ =

∫L(X )dX

I X is the prior volume

X (L) =

L(θ)>Lπ(θ)dθ

I i.e. the fraction of the prior which the iso-likelihood contour Lencloses.

Nested SamplingCalculating evidences

Nested SamplingCalculating evidences

L

X

Nested SamplingCalculating evidences

L

X

L1L2

L3

L4

L5

Nested SamplingCalculating evidences

L

XX2 X1X3X4X5

L1L2

L3

L4

L5

Nested SamplingCalculating evidences

L

X

∫L(X )dX ≈∑

i

Li(Xi−1 − Xi)

X2 X1X3X4X5

L1L2

L3

L4

L5

Nested SamplingCalculating evidences

L

XX2 X1X3X4X5

L1L2

L3

L4

L5

∫L(X )dX ≈∑

i

Liwi

Estimating evidencesEvidence error

L

X

Estimating evidencesEvidence error

L

X

Estimating evidencesEvidence error

L

X

Estimating evidencesEvidence error

L

logX

Estimating evidencesEvidence error

L

logX

Z =∫LdX

Estimating evidencesEvidence error

L

logX

Z =∫XL d(logX )

Estimating evidencesEvidence error

logX

L

XL

Estimating evidencesEvidence error

logX

L

XL

H

Estimating evidencesEvidence error

logX

L

XL

H

H =∫log

dPdX

dX

Estimating evidencesEvidence error

I approximate compression:

∆ logX ∼ −1

n± 1

n

logXi ∼ −i

n±√i

n

I # of steps to get to H:

iH ∼ nH

I estimate of volume at H:

logXH ≈ −H ±√

H

n

I estimate of evidence error:

logZ ≈∑

wiLi ±√

H

n

Estimating evidencesEvidence error

I approximate compression:

∆ logX ∼ −1

n

± 1

n

logXi ∼ −i

n±√i

n

I # of steps to get to H:

iH ∼ nH

I estimate of volume at H:

logXH ≈ −H ±√

H

n

I estimate of evidence error:

logZ ≈∑

wiLi ±√

H

n

Estimating evidencesEvidence error

I approximate compression:

∆ logX ∼ −1

n± 1

n

logXi ∼ −i

n±√i

n

I # of steps to get to H:

iH ∼ nH

I estimate of volume at H:

logXH ≈ −H ±√

H

n

I estimate of evidence error:

logZ ≈∑

wiLi ±√

H

n

Estimating evidencesEvidence error

I approximate compression:

∆ logX ∼ −1

n± 1

n

logXi ∼ −i

n

±√i

n

I # of steps to get to H:

iH ∼ nH

I estimate of volume at H:

logXH ≈ −H ±√

H

n

I estimate of evidence error:

logZ ≈∑

wiLi ±√

H

n

Estimating evidencesEvidence error

I approximate compression:

∆ logX ∼ −1

n± 1

n

logXi ∼ −i

n±√i

n

I # of steps to get to H:

iH ∼ nH

I estimate of volume at H:

logXH ≈ −H ±√

H

n

I estimate of evidence error:

logZ ≈∑

wiLi ±√

H

n

Estimating evidencesEvidence error

I approximate compression:

∆ logX ∼ −1

n± 1

n

logXi ∼ −i

n±√i

n

I # of steps to get to H:

iH ∼ nH

I estimate of volume at H:

logXH ≈ −H ±√

H

n

I estimate of evidence error:

logZ ≈∑

wiLi ±√

H

n

Estimating evidencesEvidence error

I approximate compression:

∆ logX ∼ −1

n± 1

n

logXi ∼ −i

n±√i

n

I # of steps to get to H:

iH ∼ nH

I estimate of volume at H:

logXH ≈ −H ±√

H

n

I estimate of evidence error:

logZ ≈∑

wiLi ±√

H

n

Estimating evidencesEvidence error

I approximate compression:

∆ logX ∼ −1

n± 1

n

logXi ∼ −i

n±√i

n

I # of steps to get to H:

iH ∼ nH

I estimate of volume at H:

logXH ≈ −H ±√

H

n

I estimate of evidence error:

logZ ≈∑

wiLi ±√

H

n

Nested samplingParameter estimation

I NS can also be used to sample the posterior

I The set of dead points are posterior samples with anappropriate weighting factor

Nested samplingParameter estimation

I NS can also be used to sample the posterior

I The set of dead points are posterior samples with anappropriate weighting factor

Nested samplingParameter estimation

I NS can also be used to sample the posterior

I The set of dead points are posterior samples with anappropriate weighting factor

When NS succeeds

0.4 0.2 0.0 0.2 0.4µ

0.6

0.8

1.0

1.2

1.4

A

4 2 0 2 4position

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

sign

al

When NS suceeds

0.4 0.2 0.0 0.2 0.4µ

0.6

0.8

1.0

1.2

1.4

A

4 2 0 2 4position

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

sign

al

When NS succeeds

4 2 0 2 4µ

0.0

0.5

1.0

1.5

2.0

A

4 2 0 2 4position

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

sign

al

When NS suceeds

4 2 0 2 4µ

0.0

0.5

1.0

1.5

2.0

A

4 2 0 2 4position

0.0

0.2

0.4

0.6

0.8

1.0

1.2

1.4

sign

al

Sampling from a hard likelihood constraint

“It is not the purpose of this introductory paper todevelop the technology of navigation within such avolume. We merely note that exploring a hard-edgedlikelihood-constrained domain should prove to be neithermore nor less demanding than exploring alikelihood-weighted space.”

— John Skilling

Sampling from a hard likelihood constraint

“It is not the purpose of this introductory paper todevelop the technology of navigation within such avolume. We merely note that exploring a hard-edgedlikelihood-constrained domain should prove to be neithermore nor less demanding than exploring alikelihood-weighted space.”

— John Skilling

Sampling within an iso-likelihood contourPrevious attempts

Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).

I Suffers in high dimensions

Hamiltonian sampling F. Feroz & J. Skilling (2013).

I Requires gradients and tuning

Diffusion Nested Sampling B. Brewer et al. (2009).

I Very promisingI Too many tuning parameters

Sampling within an iso-likelihood contourPrevious attempts

Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).

I Suffers in high dimensions

Hamiltonian sampling F. Feroz & J. Skilling (2013).

I Requires gradients and tuning

Diffusion Nested Sampling B. Brewer et al. (2009).

I Very promisingI Too many tuning parameters

Sampling within an iso-likelihood contourPrevious attempts

Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).

I Suffers in high dimensions

Hamiltonian sampling F. Feroz & J. Skilling (2013).

I Requires gradients and tuning

Diffusion Nested Sampling B. Brewer et al. (2009).

I Very promisingI Too many tuning parameters

Sampling within an iso-likelihood contourPrevious attempts

Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).

I Suffers in high dimensions

Hamiltonian sampling F. Feroz & J. Skilling (2013).

I Requires gradients and tuning

Diffusion Nested Sampling B. Brewer et al. (2009).

I Very promisingI Too many tuning parameters

Sampling within an iso-likelihood contourPrevious attempts

Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).

I Suffers in high dimensions

Hamiltonian sampling F. Feroz & J. Skilling (2013).

I Requires gradients and tuning

Diffusion Nested Sampling B. Brewer et al. (2009).

I Very promisingI Too many tuning parameters

Sampling within an iso-likelihood contourPrevious attempts

Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).

I Suffers in high dimensions

Hamiltonian sampling F. Feroz & J. Skilling (2013).

I Requires gradients and tuning

Diffusion Nested Sampling B. Brewer et al. (2009).

I Very promisingI Too many tuning parameters

Sampling within an iso-likelihood contourPrevious attempts

Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).

I Suffers in high dimensions

Hamiltonian sampling F. Feroz & J. Skilling (2013).

I Requires gradients and tuning

Diffusion Nested Sampling B. Brewer et al. (2009).

I Very promising

I Too many tuning parameters

Sampling within an iso-likelihood contourPrevious attempts

Rejection Sampling MultiNest; F. Feroz & M. Hobson (2009).

I Suffers in high dimensions

Hamiltonian sampling F. Feroz & J. Skilling (2013).

I Requires gradients and tuning

Diffusion Nested Sampling B. Brewer et al. (2009).

I Very promisingI Too many tuning parameters

PolyChord“Hit and run” slice sampling

L0

x0

PolyChord“Hit and run” slice sampling

L0

x0

1) From an initial starting point x0,choose a random direction

PolyChord“Hit and run” slice sampling

L0

x0

1) From an initial starting point x0,choose a random direction

PolyChord“Hit and run” slice sampling

L0

x0

2) Place random initial boundof width w

PolyChord“Hit and run” slice sampling

L0

x0

2) Place random initial boundof width w

PolyChord“Hit and run” slice sampling

L0

x0

3) Extend bounds by“stepping out” procedure

PolyChord“Hit and run” slice sampling

L0

x0

3) Extend bounds by“stepping out” procedure

PolyChord“Hit and run” slice sampling

L0

x0

3) Extend bounds by“stepping out” procedure

PolyChord“Hit and run” slice sampling

L0

x0

3) Extend bounds by“stepping out” procedure

PolyChord“Hit and run” slice sampling

L0

x0

3) Extend bounds by“stepping out” procedure

PolyChord“Hit and run” slice sampling

L0

x0

4) Sample uniformly along this chord

PolyChord“Hit and run” slice sampling

L0

x0

4) Sample uniformly along this chord

x1

PolyChord“Hit and run” slice sampling

L0

x0

x1

If x1 is drawn outside of L0,then contract bounds,

PolyChord“Hit and run” slice sampling

L0

x0

If x1 is drawn outside of L0,then contract bounds,

PolyChord“Hit and run” slice sampling

L0

x0

Draw again until x1 is within L0.

PolyChord“Hit and run” slice sampling

L0

x0

Draw again until x1 is within L0.

x1

PolyChord“Hit and run” slice sampling

L0

x0

x1

5) Repeat this procedure,starting with x1

PolyChord“Hit and run” slice sampling

L0

x0

x1

PolyChord“Hit and run” slice sampling

L0

x0

x1

PolyChord“Hit and run” slice sampling

L0

x0

x1

x2

PolyChord“Hit and run” slice sampling

L0

x0

x2

PolyChord“Hit and run” slice sampling

L0

x0

x2

PolyChord“Hit and run” slice sampling

L0

x0

x2x3

PolyChord“Hit and run” slice sampling

L0

x0

x3

PolyChord“Hit and run” slice sampling

L0

x0

x3

PolyChord“Hit and run” slice sampling

L0

x0

x3x4

PolyChord“Hit and run” slice sampling

L0

x0

x3

PolyChord“Hit and run” slice sampling

L0

x0

x3

x4

PolyChord“Hit and run” slice sampling

L0

x0x4

PolyChord“Hit and run” slice sampling

L0

x0x4

PolyChord“Hit and run” slice sampling

L0

x0x4

xN

PolyChord“Hit and run” slice sampling

L0

x0

xN

PolyChordKey points

I This procedure satisfies detailed balance.

I Works even if L0 contour is disjoint.

I Need N reasonably large ∼ O(ndims) so that xN isde-correlated from x1.

PolyChordKey points

I This procedure satisfies detailed balance.

I Works even if L0 contour is disjoint.

I Need N reasonably large ∼ O(ndims) so that xN isde-correlated from x1.

PolyChordKey points

I This procedure satisfies detailed balance.

I Works even if L0 contour is disjoint.

I Need N reasonably large ∼ O(ndims) so that xN isde-correlated from x1.

PolyChordKey points

I This procedure satisfies detailed balance.

I Works even if L0 contour is disjoint.

I Need N reasonably large ∼ O(ndims) so that xN isde-correlated from x1.

PolyChordKey points

I This procedure satisfies detailed balance.

I Works even if L0 contour is disjoint.

I Need N reasonably large ∼ O(ndims) so that xN isde-correlated from x1.

Issues with Slice Sampling

1. Does not deal well with correlated distributions.

2. Need to “tune” w parameter.

Issues with Slice Sampling

1. Does not deal well with correlated distributions.

2. Need to “tune” w parameter.

Issues with Slice Sampling

1. Does not deal well with correlated distributions.

2. Need to “tune” w parameter.

PolyChord’s solutionsCorrelated distributions

Contour fromcorrelateddistribution

Affine transformation y = Lx

y

x Contour withdimensions∼ O(1)

PolyChord’s solutionsCorrelated distributions

I We make an affine transformation to remove degeneracies,and “whiten” the space.

I Samples remain uniformly sampled

I We use the covariance matrix of the live points and allinter-chain points

I Cholesky decomposition is the required skew transformation

I w = 1 in this transformed space

PolyChord’s solutionsCorrelated distributions

I We make an affine transformation to remove degeneracies,and “whiten” the space.

I Samples remain uniformly sampled

I We use the covariance matrix of the live points and allinter-chain points

I Cholesky decomposition is the required skew transformation

I w = 1 in this transformed space

PolyChord’s solutionsCorrelated distributions

I We make an affine transformation to remove degeneracies,and “whiten” the space.

I Samples remain uniformly sampled

I We use the covariance matrix of the live points and allinter-chain points

I Cholesky decomposition is the required skew transformation

I w = 1 in this transformed space

PolyChord’s solutionsCorrelated distributions

I We make an affine transformation to remove degeneracies,and “whiten” the space.

I Samples remain uniformly sampled

I We use the covariance matrix of the live points and allinter-chain points

I Cholesky decomposition is the required skew transformation

I w = 1 in this transformed space

PolyChord’s solutionsCorrelated distributions

I We make an affine transformation to remove degeneracies,and “whiten” the space.

I Samples remain uniformly sampled

I We use the covariance matrix of the live points and allinter-chain points

I Cholesky decomposition is the required skew transformation

I w = 1 in this transformed space

PolyChord’s solutionsCorrelated distributions

I We make an affine transformation to remove degeneracies,and “whiten” the space.

I Samples remain uniformly sampled

I We use the covariance matrix of the live points and allinter-chain points

I Cholesky decomposition is the required skew transformation

I w = 1 in this transformed space

PolyChord’s Additions

I Parallelised up to number of live points with openMPI.

I Novel method for identifying and evolving modes separately.

I Implemented in CosmoMC, as “CosmoChord”, with fast-slowparameters.

PolyChord’s Additions

I Parallelised up to number of live points with openMPI.

I Novel method for identifying and evolving modes separately.

I Implemented in CosmoMC, as “CosmoChord”, with fast-slowparameters.

PolyChord’s Additions

I Parallelised up to number of live points with openMPI.

I Novel method for identifying and evolving modes separately.

I Implemented in CosmoMC, as “CosmoChord”, with fast-slowparameters.

PolyChord’s Additions

I Parallelised up to number of live points with openMPI.

I Novel method for identifying and evolving modes separately.

I Implemented in CosmoMC, as “CosmoChord”, with fast-slowparameters.

PolyChord in actionPrimordial power spectrum PR(k) reconstruction

logPR(k)

log k

As

(kk∗

)ns−1

PolyChord in actionPrimordial power spectrum PR(k) reconstruction

logPR(k)

log k

(k1,P1)

(k2,P2)

PolyChord in actionPrimordial power spectrum PR(k) reconstruction

logPR(k)

log k

(k1,P1)

(k2,P2)

(k3,P3)

PolyChord in actionPrimordial power spectrum PR(k) reconstruction

logPR(k)

log k

(k1,P1)

(k2,P2)

(k3,P3)

(k4,P4)

PolyChord in actionPrimordial power spectrum PR(k) reconstruction

logPR(k)

log k

(k1,P1)

(k2,P2)

(k3,P3)

(k4,P4)

(kNknots,PNknots

)

Planck dataPrimordial power spectrum PR(k) reconstruction

I Temperature data TT+lowP

I Foreground (14) & cosmological (4 + 2 ∗ Nknots − 2)parameters

I Marginalised plots of PR(k)

I

P(PR|k ,Nknots) =

∫δ(PR − f (k ; θ))P(θ)dθ

Planck dataPrimordial power spectrum PR(k) reconstruction

I Temperature data TT+lowP

I Foreground (14) & cosmological (4 + 2 ∗ Nknots − 2)parameters

I Marginalised plots of PR(k)

I

P(PR|k ,Nknots) =

∫δ(PR − f (k ; θ))P(θ)dθ

Planck dataPrimordial power spectrum PR(k) reconstruction

I Temperature data TT+lowP

I Foreground (14) & cosmological (4 + 2 ∗ Nknots − 2)parameters

I Marginalised plots of PR(k)

I

P(PR|k ,Nknots) =

∫δ(PR − f (k ; θ))P(θ)dθ

Planck dataPrimordial power spectrum PR(k) reconstruction

I Temperature data TT+lowP

I Foreground (14) & cosmological (4 + 2 ∗ Nknots − 2)parameters

I Marginalised plots of PR(k)

I

P(PR|k ,Nknots) =

∫δ(PR − f (k ; θ))P(θ)dθ

Planck dataPrimordial power spectrum PR(k) reconstruction

I Temperature data TT+lowP

I Foreground (14) & cosmological (4 + 2 ∗ Nknots − 2)parameters

I Marginalised plots of PR(k)

I

P(PR|k ,Nknots) =

∫δ(PR − f (k ; θ))P(θ)dθ

0 internal knotsPrimordial power spectrum PR(k) reconstruction

10−4 10−3 10−2 10−1

k/Mpc

2.0

2.5

3.0

3.5

4.0

log

(10

10P R

)10 100 1000

`

1 internal knotsPrimordial power spectrum PR(k) reconstruction

10−4 10−3 10−2 10−1

k/Mpc

2.0

2.5

3.0

3.5

4.0

log

(10

10P R

)10 100 1000

`

2 internal knotsPrimordial power spectrum PR(k) reconstruction

10−4 10−3 10−2 10−1

k/Mpc

2.0

2.5

3.0

3.5

4.0

log

(10

10P R

)10 100 1000

`

3 internal knotsPrimordial power spectrum PR(k) reconstruction

10−4 10−3 10−2 10−1

k/Mpc

2.0

2.5

3.0

3.5

4.0

log

(10

10P R

)10 100 1000

`

4 internal knotsPrimordial power spectrum PR(k) reconstruction

10−4 10−3 10−2 10−1

k/Mpc

2.0

2.5

3.0

3.5

4.0

log

(10

10P R

)10 100 1000

`

5 internal knotsPrimordial power spectrum PR(k) reconstruction

10−4 10−3 10−2 10−1

k/Mpc

2.0

2.5

3.0

3.5

4.0

log

(10

10P R

)10 100 1000

`

6 internal knotsPrimordial power spectrum PR(k) reconstruction

10−4 10−3 10−2 10−1

k/Mpc

2.0

2.5

3.0

3.5

4.0

log

(10

10P R

)10 100 1000

`

7 internal knotsPrimordial power spectrum PR(k) reconstruction

10−4 10−3 10−2 10−1

k/Mpc

2.0

2.5

3.0

3.5

4.0

log

(10

10P R

)10 100 1000

`

8 internal knotsPrimordial power spectrum PR(k) reconstruction

10−4 10−3 10−2 10−1

k/Mpc

2.0

2.5

3.0

3.5

4.0

log

(10

10P R

)10 100 1000

`

Bayes FactorsPrimordial power spectrum PR(k) reconstruction

-4

-3.5

-3

-2.5

-2

-1.5

-1

-0.5

0

0.5

0 1 2 3 4 5 6 7 8

Bayes

factor

B 0,N

w.r.t.ΛCDM

Number of internal knots N

Marginalised plotPrimordial power spectrum PR(k) reconstruction

10−4 10−3 10−2 10−1

k/Mpc

2.0

2.5

3.0

3.5

4.0

log

(10

10P R

)10 100 1000

`

Object detectionToy problem

Object detectionEvidences

I logZ ratio: −251 : −156 : −114 : −117 : −136

I odds ratio: 10−60 : 10−19 : 1 : 0.04 : 10−10

Object detectionEvidences

I logZ ratio: −251 : −156 : −114 : −117 : −136

I odds ratio: 10−60 : 10−19 : 1 : 0.04 : 10−10

Object detectionEvidences

I logZ ratio: −251 : −156 : −114 : −117 : −136

I odds ratio: 10−60 : 10−19 : 1 : 0.04 : 10−10

PolyChord vs. MultiNestGaussian likelihood

102

103

104

105

106

107

108

109

1010

1 2 4 8 16 32 64 128 256

Number

ofLikelihoodevaluations,

NL

Number of dimensions, D

PolyChordMultiNest

PolyChord vs. MultiNestGaussian likelihood

102

103

104

105

106

107

108

109

1010

1 2 4 8 16 32 64 128 256 512

Number

ofLikelihoodevaluations,

NL

Number of dimensions, D

PolyChord 2.0PolyChord 1.0

MultiNest

ConclusionsThe future of nested sampling

I We are at the beginning of a new era of sampling algorithms

I Plenty of more work in to be done in exploring new versions ofnested sampling

I Nested sampling is just the beginning

I arXiv:1506.00171

I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/

ConclusionsThe future of nested sampling

I We are at the beginning of a new era of sampling algorithms

I Plenty of more work in to be done in exploring new versions ofnested sampling

I Nested sampling is just the beginning

I arXiv:1506.00171

I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/

ConclusionsThe future of nested sampling

I We are at the beginning of a new era of sampling algorithms

I Plenty of more work in to be done in exploring new versions ofnested sampling

I Nested sampling is just the beginning

I arXiv:1506.00171

I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/

ConclusionsThe future of nested sampling

I We are at the beginning of a new era of sampling algorithms

I Plenty of more work in to be done in exploring new versions ofnested sampling

I Nested sampling is just the beginning

I arXiv:1506.00171

I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/

ConclusionsThe future of nested sampling

I We are at the beginning of a new era of sampling algorithms

I Plenty of more work in to be done in exploring new versions ofnested sampling

I Nested sampling is just the beginning

I arXiv:1506.00171

I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/

ConclusionsThe future of nested sampling

I We are at the beginning of a new era of sampling algorithms

I Plenty of more work in to be done in exploring new versions ofnested sampling

I Nested sampling is just the beginning

I arXiv:1506.00171

I http://ccpforge.cse.rl.ac.uk/gf/project/polychord/