Bayesian Methods in High Energy Astrophysics · •Data collection in high energy astrophysics is...
Transcript of Bayesian Methods in High Energy Astrophysics · •Data collection in high energy astrophysics is...
Bayesian Methodsin High Energy Astrophysics
Aneta SiemiginowskaHarvard-Smithsonian Center for Astrophysics
http://hea-www.harvard.edu/AstroStat/CHASC Astro-Statistics Collaboration
and
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 2
Outline• Data collection in High Energy Astrophysics• Statistical Model setting• CHASC Algorithms and Application Examples
• Spectral analysis with Bayes• Calibration Uncertainties• Low Counts Hardness Ratio• Significance of the diffuse structures in low counts images
NOT to talk about Solar Data, timing, feature detection, upperlimits, and more
• Stars• Supernova remnants• Hot gas galactic outflows, clusters of galaxies• Compact objects: neutron stars, accreting black holes, supermassive black holes• Relativistic jets• GRBs• etc…
Sources of High Energy Radiation
Data Collection in Space
AstronomicalObject
Telescope + Detectors Interstellar Medium
Chandra X-ray Observatory
PhysicsLoss of dataMeasurement ProcessLoss of dataInstrument characteristicsCalibration
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 5
Data Collection• Data are collected for each arriving photon:
• the (2-dimensional) location - sky coordinates• the energy• the arrival time
• All variables are discrete• High resolution -> finer discretization,• e.g., 4096 x 4096 spatial or 1024 spectral bins
• Table with photon counts for:• Spectral analysis - 1D• Spatial analysis - 2D• Timing analysis - 1D
Instrumental Effects:Detection inefficiency
• Image:• exposure map“sensitivity to photons per area”
• Spectrum:• effective area (ARF)“sensitivity to photons perenergy”
Energy
Instrumental Effects: Blurring
• Image• point source observed sizedepends on the source locationon the detector• “blurring” is described by a pointspread function (PSF)
• Spectrum• photon energy is “blurred”• a probability of detecting thephoton at given energy in adetector channel is described bya redistribution matrix (RMF)
RMF
Detector channels
Ene
rgy
PSF Simulated Images
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 8
A Model-Based Statistical Paradigm• Model Building:
• Model source spectra, image and/or time series• Model the data collection process
• Background contamination• Instrumental effects (effective area, response, pileup, psf)
• Results in a highly structured multi-level model
• Model-Based Statistical Inference• Bayesian posterior distribution• Maximum likelihood estimation
• Sophisticated Statistical Computation Methods are Required• Goals: computational stability and easy implementation• Emphasize natural link with models: The Method of Data
Augmentation
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 9
Highly Structured Models Required
van Dyk et al. 2001
Loss of data
Emitted Spectrum
Observed
Model directly the source and data collection mechanism, andinclude statistical procedure to fit the resulting highly structuredmodels and address the substantial scientific questions
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 10
flat
Bayesian Inference using Monte Carlo
• The Building Blocks of Bayesian Analysis• The sampling distribution:• The prior distribution:• The Bayes theorem and posterior distribution:
• Inference using a Monte Carlo Sample
We use MCMC(e.g. Gibbs sampler)to obtain a Monte CarloSample
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 11
Bayesian Deconvolution
• Complex data collection needs to be included in thestatistical model:
Expected photon count Instrumental
“blurring”* from calibration
Non-homogenous stochastic censoring
* from calibration
Physical SourceModel
Background Contamination
* from separate observation
Observed counts are modeled as independentPoisson variables with means of λ
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 12
Source Models
MultiScale Models forDiffuse Emission
Parametrized finite mixture models (several components)
Compound deconvolution models
simultaneous instrumental and physical deconvolution of complex sources
Smoothing prior distributions
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 13
MCMC Simulations from the Posterior
Data
Model
Compute Likelihood
prior
Draw parameters
Accept/Reject Update parameters
Calibration
Simulation from the posterior distribution requires careful andefficient algorithms:
Draw parameters from a "proposal distribution'', computelikelihood and posterior probability of the "proposed'' parametervalue given the observed data, use a Metropolis-Hastings criterionto accept or reject the "proposed" values.
pyBLoCXSPython implementation in Sherpa:Two MCMC samplers:
• Metropolis-Hastings:» centered on the best fit values
• Metropolis-Hastings mixed with Metropolis jumping rule:» centered on the current draw of parameters
Explores parameter space and summarized the full posterior or marginal posterior distributions.Computed parameter uncertainties can include calibration errors.Simulates replicate data from the posterior predictive distributions.
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 14
pyBLoCXS: Running it!
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 15
Trace of a parameter during MCMC run
3D Parameter spaceprobed with MCMC
Cummulative distribution of a parameter
median
stat
istic
s
Gamma NH
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 16
Account for Calibration UncertaintiesChandra ACIS-S Effective Area
• Non-linear errors cannot simply add tostats errors.• Include a draw from an ensemble ofeffective area curves in the simulations.
Draw effective area
Data Model
Compute Likelihood
prior
Draw parameters
Accept/Reject Update parameters
Calibration
Drake et al. 2006 Proc. SPIE, 6270,49
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 17
Effects of Calibration Uncertainties
Lee et al. 2011
Effect of the ARF uncertainty on fittedparameters and error bars:
Calibration uncertainty exceeds thestatistical errors in high S/N case
Simulations of 105 counts using A0
fit using each of colored ARFs Sim1: Γ = 2 NH=1e23Sim2: Γ = 1 NH=1e21
Sim1 Sim2
Deviations from the default effective area ARF (A0)
30 ARFs with thelargest deviations
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 18
Extreme Low Counts: Hardness Ratios
• Comparison of counts in two bands.• The lowest resolution spectral analysis (often used e.g., surveys large,
classifying faint sources).• Ignores the instrumental effects - only valid for the same instrument
• Standard methods based on Gaussian assumptions fail for lowcounts.
• Replace Gaussianlikelihood
with thePoisson likelihood
• Use Bayesian Framework => combine the likelihood with a priordistribution to compute the posterior distributions of hardness ratios
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 19
true
modeclassic
Bayes
mean
30 3
Hardness Ratio BEHR Simulations Study
• BEHR - uses Poisson models, background contaminated soft and hard counts• Classic Gaussian vs. Bayes Poisson approach
Park et al. 2006
• With large counts, theBayesian method providesmore precise error bars forhardness ratio than theclassical method - both yieldsimilar results
• The classical method fails inlow counts, error bars areunreliable
Simple ratio fails => Color or HR better
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 20
BEHR Application
Inte
nsity
Col
or
Time Color
Inte
nsity
Evolution of spectral hardness
Park et al 2006
samples
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 21
Low Counts Images
Original binned X-ray image Smoothed image
How significant is the diffuse emission?
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 22
Image Analysis:Complex Statistical Data
Smooth Extended Source• Flexible MultiScale Model
“Point” Sources• Model the location, intensity andperhaps extent and shape
Connors and van Dyk 2007
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 23
LIRA: Simulation Study
Known Physical Model Simulated Image
• Simulate the data under the assumed physical modelPhysical Model
• Fit two type of models: 1/ Physical Model
2/ Physical Model + MultiScale Residual (Extra Emission)
Connors and van Dyk 2007
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 24
LIRA: Simulation Study
Known Physical Model
Simulated Image Posterior Mean of Residuals
Connors and van Dyk 2007
null
interesting
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 25
Evidence for a StructureExamine joint posterior distribution of • Baseline scale factor: α• Expected total multi-scale counts: β
Connors and van Dyk 2007
nullextra
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 26
Summary
• Data collection in high energy astrophysics is complex.
• Bayesian framework provides a natural way to includecomplex instrumental characteristics and initialknowledge about the astronomical sources:
• Spectral analysis• Calibration uncertainties• Hardness ratios• Image analysis
• MCMC to explore the posterior distributions.
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 27
Conclusions
• The search for highly irregular and unexpected structurein astronomical data possess many statistical challenges.
• Model-based methods allow us to make progress onformalizing and answering scientific questions.
• More sophisticated computational methods and methodsfor summarizing high dimensional posterior distributionsare yet to be explored.
SAMSI Astrostatistics - Sep.19, 2012 Aneta Siemiginowska 28
Further Reading1. van Dyk, D. A., Connors, A., Kashyap, V. L., Siemiginowska, A. (2001) Analysis of Energy
Spectra with Low Photon Counts via Bayesian Posterior Simulation. The Astrophysical Journal ,548, 224-243.
2. van Dyk, D. A., Connors, A., Esch, D. N., Freeman, P., Kang, H., Karovska, M., Kashyap, V.,Siemiginowska, A., and Zezas, A. (2006). Deconvolution in High Energy Astrophysics: Science,Instrumentation, and Methods (with discussion). Bayesian Analysis, 1, 189-236.
3. Park, T., Kashyap, V. L., Siemiginowska, A., van Dyk, D. A., Zezas, A. Heinke, C. andWargelin, B. J. (2006). Hardness Ratios with Poisson Errors: Modeling and Computations. TheAstrophysical Journal, 652, 610-628.
4. Connors, A. & van Dyk, D.A., (2007), How To Win With Non-Gaussian Data: PoissonGoodness-of-Fit, SCMA IV, SCMA IV, (Eds. G.J.Babu and E.D.Feigelson), ASPC, 371,101
5. Lee, H., Kashyap, V. L., van Dyk, D. A., Connors, A., Drake, J. J., Izem, R., Meng, X. L., Min,S., Park, T., Ratzlaff, P., Siemiginowska, A., and Zezas, A. (2011). Accounting for CalibrationUncertainties in X-ray Analysis: Effective Areas in Spectral Fitting. The Astrophysical Journal,731,126-145.
6. Kashyap, V. L., van Dyk, D. A., Connors, A., Freeman, P. E., Siemiginowska, A., Xu, J., andZezas, A. (2010). On Computing Upper Limits to Source Intensities. The Astrophysical Journal,719, 900-914.
7. Protassov, R., van Dyk, D. A., Connors, A., Kashyap, V. L. and Siemiginowska, A. (2002).Statistics: Handle with Care, Detecting Multiple Model Components with the Likelihood RatioTest. The Astrophysical Journal, 571, 545-559