Strong Lens Time Delay Estimation · Tak (Hyungsuk) Tak SAMSI ASTRO / International CHASC...

Strong Lens Time Delay Estimation

Tak (Hyungsuk) Tak

SAMSI ASTRO / International CHASC Astrostatistics Collaboration

2 Nov 2016

Joint work with Kaisey Mandel (Center for Astrophysics; CfA),David A. van Dyk (Imperial College London), Vinay L. Kashyap (CfA),Xiao-Li Meng (Harvard), and Aneta Siemiginowska (CfA)

1 / 25

Outline

1. Introduction: Strong gravitational lensing

2. Data

3. Our model assumptionsI State-space modelI Distributions of the observed dataI Distributions of the latent data

4. Bayesian: Prior distributions for unknown parameters

5. Bayesian: Full posterior and sampler

6. Frequentist: Profile likelihood

7. Our estimation strategy

8. Example: Time Delay Challenge

9. Microlensing (How to handle multimodality)I ProblemI ModelI Examples 1, 2 (Q0957+561), and 3 (J1029+2623)

10. Conclusion: Time Delay Challenge 2

11. (If time allows) A new MCMC method for multimodality

2 / 25

Introduction

Strong gravitational lensing

Credit: NASA’s Goddard Space Flight Center

The strong gravitational field of a lensing galaxy splits light into two images.

I Light rays take different routes whose lengths can be different.

I Difference between their arrival times → Time delay (∆)

3 / 25

Introduction

Image Credit: NASA/JPL-Caltech

Time delay is used to infer cosmological parameters, e.g.,

I Hubble constant H0 (Refsdal, 1964)

I Equation of state of dark energy (Linder, 2011)

4 / 25

Data

Simulated data of a doubly-lensed quasar , an active black hole(Image Credit: NASA/ESA).

Data are composed of two time series with measurement errors.

I Observation times t ≡ {t1, t2, . . . , tn}>

I Observed magnitudes x ≡ {x1, x2, . . . , xn}>, and y

I Measurement errors (SD) δ ≡ {δ1, δ2, . . . , δn}> and η

Our job is to estimate time delay (shift in the horizontal axis) betweentwo time series.

5 / 25

State-space model

I ∃ latent light curves representing the unobserved true magnitudes incontinuous time (red and blue dashed curves).

X(t) = (X (t1),X (t2), . . . ,X (tn))> and Y(t), values on curves at t

I A curve-shifted model (Pelt et al., 1994):

Y(t) = X(t−∆) + β0,

where the time delay ∆ and magnitude offset β0 are unknown.

6 / 25

Distributions of the observed data

Observed data with independent Gaussian measurement errors

I xj | X (tj)indep.∼ Normal[X (tj), δ

2j ]

I yj | Y (tj)indep.∼ Normal[Y (tj), η

2j ]

yj | X (tj −∆),∆, β0indep.∼ Normal[X (tj −∆) + β0, η

2j ].

I p(x, y | X(t∆),∆, β0)

=∏n

j=1 p[xj | X (tj)]× p[yj | X (tj −∆),∆, β0],

where t∆ denotes the sorted 2n observation times of t and t−∆.

7 / 25

Distributions of the latent data

Latent data with Ornstein-Uhlenbeck (O-U)/Damped RW process

I Many astrophysicists have supported the O-U process; Kelly+

(2009), Kozlowski+ (2010), MacLeod+ (2010), Zu+ (2013),

Tewes+(2013), Hojjati+(2014), Bonvin+(2016), and more!

I dX (t) = − 1τ

(X (t)− µ

)dt + σdB(t), where τ is a mean-reversion

time, µ is the overall mean, and σ is the short-term variability.

I O-U process is a Gaussian process with a Matern(1/2) kernel.

I X(t∆) ∼ Normal2n with some mean vector and covariance matrix

I p(X(t∆) | µ, σ, τ,∆) =

p(X (t∆1 ) | µ, σ, τ,∆)×

∏2nj=2 p(X (t∆

j ) | X (t∆j−1), µ, σ, τ,∆)

8 / 25

Bayesian: Prior Distributions for Parameters

From statistician’s perspective,

I Prior distributions must guarantee posterior propriety, i.e.∫p(parameters | data) d(parameters)

=

∫Lik(parameters)× p(parameters) d(parameters) <∞.

I Without knowing posterior propriety, no one can tell whether theresulting posterior sample is from the target posterior distribution ornot (Hobert and Casella, 1994).

I When we are not sure about posterior propriety: Use proper priors!

From astrophysicist’s perspective,

I Set up parameters in the proper prior distributions in a way to reflecton astrophysics and the dynamic of the O-U process.

9 / 25

Bayesian: Prior Distributions for ParametersI ∆ ∼ Uniform(u1, u2)

I [u1, u2], if ∃ prior information to restrict the range of ∆, e.g., aphysical model of the lens, redshift, and relative locations.

I [t1 − tn, tn − t1], otherwise.

I Mean of the O-U, µ ∼ Uniform(−30, 30), why 30?I Magnitude offset, β0 ∼ Uniform(−60, 60), why 60?

Inv-Gamma distribution sets a soft lower bound of a random variable

If X ∼ Inv-Gamma(a, b), p(x) ∝ 1xa+1 exp(−b/x) with a mode at b

a+1 .

I Variance of the O-U, σ2 ∼ Inv-Gamma(1, c), why 1&c?I Timescale of the O-U, τ ∼ Inv-Gamma(1, 1), why 1&1?I Estimates for 9,275 SDSS quasar light curves (MacLeod+, 2010)

10 / 25

Bayesian: Full Posterior and Sampler

Notation: θOU ≡ (µ, σ2, τ) and Dobs ≡ {x, y}I Full Posterior: p(X(t∆),∆, β0, θOU | Dobs)

∝ p(x(t), y(t) | X(t∆),∆, β0) Observed data

× p(X(t∆) | ∆, θOU) Latent data

× p(∆, β0, θOU) Priors

I Metropolis-Hastings within Gibbs sampler

1. p(X(t∆),∆ | β0, θOU ,Dobs , )

2. p(β0 | X(t∆),∆, θOU ,Dobs)

3. p(θOU | β0,X(t∆),∆,Dobs)

I Pros: Complete investigation on all the model parameters

I Cons: Inefficient when ∃ multimodality

11 / 25

Frequentist: Profile likelihood

A profile likelihood function enables us to focus on the parameter ofinterest with nuisance parameters maximized out (Berger et al., 1996)

I L(∆, β0, θOU) =∫R2n p(x, y | X(t∆),∆, β0)× p(X(t∆) | ∆, θOU) dX(t∆)

I Lprof (∆) ≡ maxβ0,θOUL(∆, β0, θOU) = L(∆, β0(∆), θOU(∆))

I Lprof (∆) ∝ p(∆ | Dobs) asymptotically

I Provides approximate posterior mean, mode, standard deviation, andmost importantly shape of the (approximate) distribution

I Pros: Simple to implement and easy to find multi-modes

I Cons: Computationally expensive for drawing a finer curve / noinformation about the relationship among parameters

12 / 25

Our time delay estimation strategy

Bayesian and frequentist methods complement each other!

Our time delay estimation strategy:

1. Obtain the profile likelihood curve to check multimodality

2. Initialize Bayesian method near the modes identified by Lprof(∆)

13 / 25

Example: Time Delay ChallengeTime Delay Challenge (TDC, Dobler et al., 2015; Liao et al., 2015)

I A blind competition held by 8 astrophysicists from 2013 to 2014.I Goals: (1) Providing an observation strategy for the LSST.

(2) Improving current estimation methods.I About 5,000 simulated data sets with some time delays (O-U).I 13 teams blindly analyzed the simulated data.

14 / 25

Example: Time Delay ChallengeSimulated data of a doubly-lensed quasar from TDC

(1) The entire profile (log) likelihood curve

15 / 25

Example: Time Delay Challenge(2) Posterior distribution of ∆ initialized near the dominant mode

(3) Estimation summary for ∆

Method Truth Post. Mean Post. SD

Bayesian45.85

46.26 0.41Profile likelihood 46.26 0.40

(4) Model Checking: Blue light curve is shifted by ∆, and β0.

16 / 25

Microlensing

I Microlensing occurs when stars unusually close to the paths of lightintroduce independent noise into magnification of brightness lightcurves (Tewes et al., 2013).

I Two light curves may have different long-term trends, e.g.,polynomial.

17 / 25

Microlensing: Problem

I A curve-shifted model does not work because one of the latentcurves is no longer a shifted version of the other.

I A small overlap between two light curves (bottom plots) is the onlysimilar fluctuation patterns detectable by shifting one of the lightcurves → several modes near margins of the range of ∆.

18 / 25

Microlensing: ModelMicrolensing model

I Popular way is to model the long-term trend of each light curve viaa polynomial regression.

I Our microlensing model accounts for the difference betweenlong-term trends using an mth-order polynomial regression.

Y (t) = X (t −∆) + w>m (t − ∆)β,

where wm(t −∆) ≡ (1, t −∆, . . . , (t −∆)m)>, and

β ≡ (β0, β1, . . . , βm)> are regression coefficients.

I Our microlensing model reduces the number of regressioncoefficients by half!

19 / 25

Microlensing Example 1(1) We set m = 3 as a default (Kochanek+, Morgan+, Tewes+).

(2) Estimation summary (Error ≡ |∆true − ∆| and χ ≡ Error/SD)

Method Truth E(∆|Data) SD(∆|Data) Error χ

Bayesian5.86

6.34 0.28 0.48 1.71Profile Lik. 6.36 0.28 0.50 1.76

(3) Model Checking: Blue light curve is adjusted by ∆ and β.

20 / 25

Microlensing Example 2: Q0957+561r -band data collected at the US Naval Observatory (Hainline+, 2012).

ResearchersNumber of Observation Measurement

∆ SEobservations period error (mag)

Pelt et al. (1996) 831 1979–1994 0.0159 423 6

Oscoz et al. (1997) 86 1994–1996 0.01, 0.02 424 3

Serra-Ricart et al. (1999) 197 1996–1998 0.023, 0.025 425 4

Oscoz et al. (2001) 100 1994–1996 0.009, 0.01 423 2

Shalyapin et al. (2012) 371 2005–2010 0.012 420.6 1.9

This work 57 2008–2011 0.004423.69 2.02423.21 2.81

21 / 25

Microlensing Example 3: J1029+2623Data reported by Fohlmeister et al. (2013).

Researchers Method Estimate 90% Interval

Fohlmeister et al. (2013) χ2-minimization (AIC, BIC) 744 (734, 754)

Kumar, Stalin, andDifference-smoothing 743.5 (734.6, 752.4)

Prabhu (2014)

This workBayesian 735.28 (733.08, 737.59)

Profile likelihood 733.11 (732.94, 738.44)

Our point estimate is much smaller than theirs (by about 10 days) & our90% intervals are much shorter than theirs. What’s wrong here? Is itbecause our model is over-confident?

22 / 25

Microlensing Example 3: J1029+2623 (cont.)

Fohlmeister et al. (2013)

I A point estimator based on high-dimensional optimization

I Linear microlensing model (AIC) + a model w/o microlensing (BIC)

Kumar et al. (2014)

I A point estimator also based on high-dimensional optimization

I A spline with a Gaussian kernel to account for microlensing

It reveals that our model accounts for microlensing better than theirs.

23 / 25

Discussion: Time Delay Challenge IIDoubly-lensed multi-filter light curves (Marshall et al., 2016+)

I Six bands, u, g , r , i , z , y , lead to vector time series.

I 5,000+ systems, 6 bands for each system, 150 obs. for each band.

I Bluer filters are more sensitive to microlensing than redder filters.

I All the information (except ∆) needed to calculate H0 will be given.

I Evaluation is based on two numbers, H0 estimate and its uncertainty.

I http://timedelaychallenge.org for more information!

24 / 25

Reference1. H. Tak, K. Mandel, D. A. van Dyk, V. Kashyap, X. Meng, and A. Siemiginowska (in progress) “Bayesian and Profile Likelihood

Strategies for Time Delay Estimation from Stochastic Time Series of Gravitationally Lensed Quasars”, tentatively accepted to

Annals of Applied Statistics. (ArXiv 1602.01462)

2. M. Tewes, F. Courbin, and G. Meylan (2013) “COSMOGRAIL: the COSmological MOnitoring of GRAvitational Lenses”

Astronomy & Astrophysics, 553, A120.

3. S. Kozlowski et al. (2010), “Quantifying quasar variability as part of a general approach to classifying continuously varying

sources,” The Astrophysical Journal, 708, 927.

4. C.L. MacLeod et al. (2010), “Modeling the time variability of SDSS Stripe 82 quasars as a damped random walk, ” The

Astrophysical Journal, 721, 1014.

5. Y. Zu et al. (2011), “An alternative approach to measuring reverberation lags in active galactic nuclei,” The Astrophysical

Journal, 735, 80.

6. B. Kelly, J. Bechtold, and A. Siemiginowska (2009) “Are the variations in quasar optical flux driven by thermal fluctuation?” The

Aphysical Journal, 698, 895.

7. G. Dobler, C. Fassnacht, T. Treu, P. Marchall, K. Liao, A. Hojjati, E. Linder, and N. Rumbaugh (2015) “Strong lens time delay

challenge: I. Experimental design.” The Astrophysical Journal, 799, 168.

8. K. Liao et al. (2015) “Strong lens time delay challenge: II. Results of TDC1.” The Astrophysical Journal, 800, 23.

9. C. S. Kochanek et al. (2006) “The time delays of gravitational lens HE0435+1223: An early-type galaxy with a rising rotation

curve.” The Astrophysical Journal, 640, 47.

10. C. W. Morgan et al. (2012) “Further evidence that quasar X-ray emitting regions are compact: X-ray and optical microlensing in

the lensed quasar Q J0158+4325.” The Astrophysical Journal, 756, 9.

11. S. Refsdal (1964) “The Gravitational Lens Effect.” Monthly Notices of the Royal Astronomical Society, 128, 295–306.

12. E. V. Linder (2011) “Lensing Time Delays and Cosmological Complementarity.” Phys. Rev. D, 84, 12.

13. A. Hojjati, A. G. Kim, and E. V. Linder (2014) “Robust strong lensing time delay estimation.” Phys. Rev. D, 87, 12.

14. Bonvin et al. (2016) “COSMOGRAIL: the COSmological MOnitoring of GRAvItational Lenses XV. Assessing the achievability

and precision of time-delay measurements” Astronomy & Astrophysics, 585, A88.

25 / 25

Strong Lens Time Delay Estimation · Tak (Hyungsuk) Tak SAMSI ASTRO / International CHASC...

Documents

Transcript of Strong Lens Time Delay Estimation · Tak (Hyungsuk) Tak SAMSI ASTRO / International CHASC...