Surveys of cosmic populations: Statistical issues...
Transcript of Surveys of cosmic populations: Statistical issues...
Surveys of cosmic populations:Statistical issues
Eddington, Malmquist, Lutz-Kelker and all that
Tom LoredoDept. of Astronomy, Cornell University
http://www.astro.cornell.edu/staff/loredo/
Cosmic Populations @ CASt — 9–10 June 2014
1 / 42
Size-Frequency Distributions
log(N)–log(S) curves, number counts, number-size dist’ns. . .
Lunar Craters TNOs
GRBsQuasars
Solar Flares
3 / 42
Intrinsic Size-Frequency Distributions
Craters, Solar flares, asteroids
Known source distance → convert apparent sizes to intrinsic sizes
Lunar Craters Solar Flares
Note approximate power law behavior
4 / 42
Apparent Size-Frequency DistributionsTNOs, star counts, galaxy counts, GRBs. . .
Distance unknown → SFD is “projection” of intrinsic dist’n
F ∝ L/d2 (stars, galaxies)
∝ D2/(d2⊙d
2⊕) (minor planets)
∼ 1200 GRBs from 4th BATSE Survey
5 / 42
Diverse Distributions
Basic: Directions and Fluxes
Peak fluxes and directions of GRBs from 4B catalog
6 / 42
Directions, Fluxes and Indicators
Luminosity & Distance Distributions
104 Galaxies from Millennium Galaxy Catalogue
7 / 42
High-dimensional
Exoplanet properties
Unadjusted scatterplots from Open Exoplanet Catalogue
8 / 42
Two classes of surveys
Blind (ab initio discovery) surveysSurvey a region and attempt to find sources using only thenew data
• Most large-scale sky surveys• GRB surveys• . . .
Targeted (follow-up/counterpart) surveysSurvey a known population in a new regime
• Multi-λ surveys• Variability surveys• SN surveys• Exoplanet discovery via RV, transit observations• . . .
Focus here on blind surveys aiming to learn luminosity distributions
See Feigelson’s talk for discussion of selection effects in targeted surveys
11 / 42
Surveying and “Un-surveying”
F
z
Observables Measurements CatalogPopulation
SelectionObservationMapping
Space: !" # # #
= precise = uncertain
L
r
Indicator scatter & Transformation bias
Measurement Error
Truncation & Censoring
F
z
F
zχ
⇐ Inference goes this way!
• This lecture: Understanding the forward process
• Later lectures: How to go backward
12 / 42
Statistical issues for surveys
1 Observables, desirables & integralsBrightness distributionsDistance–luminosity distributions
2 Cornucopia of complicationsSelection effectsScatter distortions
3 Statistical methodology — A glimpse
13 / 42
Statistical issues for surveys
1 Observables, desirables & integralsBrightness distributionsDistance–luminosity distributions
2 Cornucopia of complicationsSelection effectsScatter distortions
3 Statistical methodology — A glimpse
14 / 42
“You can’t always get what you want”What we want
• The distribution of sources in space: number density
n(r) = n(r ,Ω) at distance r , direction Ω
= n(r) (isotropic case)
An intensity function for a (possibly non-homogeneous)Poisson point process:
p(object in dV ) = n(r)dV
and for disjoint regions V1 and V2,
p(object inV1|object inV2, n) = p(object inV1|n)
⇒ p(N objects inV |n) =µn
n!e−µ, µ =
∫
V
n(r)dr
15 / 42
• The distribution of source luminosities: luminosity distribution
fL(L; r ) = pdf for L given r
= δ(L− L0) — “standard candles”
= fL(L) — “universal”
= fL(L; r) — “evolution” (isotropic)
• The luminosity function: Ψ(r , L) = n(r)fL(L; r )This defines a marked point process
Terminology & notation not uniform in the literature E.g., fL sometimes
called “luminosity function”
16 / 42
What we getThe primary observables are direction, Ω, and flux
(energy/unit time/unit area):
F =L
4πr2← the “root of all evil!”
Note this conflates r and L!
Further complications (ignored here!)
• Passbands/k-corrections
• Extinction
• Reflective sources (F ∝ 1/r4)
• Transient/variable sources
• . . .
17 / 42
Example: Quasar Optical Luminosity Function
Magnitude Dist n Redshifts
2QZ Magnitudes, directions, & redshifts for > 25,000 QSOs with B < 21
18 / 42
Cosmology
Geometry of space-time alters inverse-square law (redshift, timedilation) and volume element in spatial integrals:
Flux in band b
F b =Lbol
4πr2C1(r , χ, α)
Cosmo params H0, Ωm, ΩΛ
Spectral parameters
dV = r2dr dΩC2(r , χ)
Redshift z (observable!) usually used as a proxy for r via Hubble’slaw (at low z):
z =λ− λ0
λ0=
vHub + vpec
c
→ cz = H0r + vpec
For z>∼1, must use the full luminosity distance-redshift law fromGR; depends on H0, Ωm, ΩΛ. . .
20 / 42
What We Really Want
Physics works in phase space: positions and velocities.
We’d really like to infer ρ(r , v ) — all the issues of inferring n(r),plus issues from imperfect measurement of v .
Applications:
• Stars in Milky Way — Galactic dynamics, dark matter
• Stars in nearby dwarf galaxies — dark matter on small scales
• Galaxy proper motions — large scale structure
We’ll focus on position + luminosity; many issues we’ll discuss areonly more important for upcoming peculiar velocity and parallaxsurveys (6dF, Gaia).
21 / 42
Diverse UnitsRadio, x-ray & γ-ray surveys, and surveys of other quanta (cosmicrays, grav’l radiation, neutrinos) use energy units directly (L, F ).
Optical & IR surveys use absolute magnitude M instead of L:
M ≡ −2.5 log10L
Lfid, fM(M; r) = pdf for M given r
and apparent magnitude m instead of F :
m ≡ −2.5 log10F
Ffid= −2.5 log10
L
4πr2Ffid= M + µ
with distance modulus µ instead of r
µ = 5 log10r
10pc(stars)
= 5 log10r
Mpc+ 25 (galaxies)
22 / 42
The Three-Halves (or Five-Halves) LawAssumptions
• Euclidean space: F = L
4πr2
• Homogeneous and isotropic: n(r ) = n0
• Standard candles: fL(L; r) = δ(L− L0)
Flux distribution
A precise flux measurement → r(F ) =(
L04πF
)1/2
# with flux > F = # closer than r(F )
N>(F ) =4π
3[r(F )]3n0
∝ F−3/2
Differential distribution (surf. dens. per unit flux & steradian):
Σ(F ) = −1
4π
dN>
dF∝ F−5/2
23 / 42
Generalizing:
Fundamental Equation of Stellar StatisticsΣ = density of sources wrt observables (direction, flux ormagnitude, . . . )
p(F in dF ,Ω in dΩ) = Σ(F ,Ω)dFdΩ
[Σ] = #/(unit sr × unit flux) or #/(sqr degree × unit mag). . .
In spherical coordinates (r , θ, φ), volume element is dV = r2drdΩ
with dΩ = sin θdθdφ
Use law of total probability to calclulate Σ(F ,Ω) from luminosityfunction:
Σ(F ,Ω) =
∫
dr r2∫
dL p(r ,Ω, L,F )
=
∫
dr r2∫
dL p(r ,Ω, L) p(F |r , L)
=
∫
dr
∫
dL r2 n(r) fL(L; r ) δ
(
F −L
4πr2
)
24 / 42
Flux and magnitude versions
Σ(F ,Ω) =
∫
dr
∫
dL r2 n(r) fL(L; r ) δ
(
F −L
4πr2
)
= 4π
∫
dr r4 n(r) fL(4πr2F ; r)
Σ(m,Ω) =
∫
dr
∫
dM r2 n(r) fM(M; r ) δ [m − (M + µ)]
=
∫
dr r2 n(r) fM(m − µ(r); r)
If either the density or luminosity function is known, and if Σ isaccurately measured, this is a Fredholm integral equation.
But Σ is sampled (incompletely), often with measurement error, andusually both n and f are uncertain.
25 / 42
Visualizing the IntegralLuminosity: M ∼ Norm(−21, .32)Density: Uniform to r = 100; linear dropCurve has constant m = 13.25 ± 0.25
26 / 42
Indicators
Indicators are additional observables, σ, that help make r and L
identifiable ⇒ unravel the integral.
Two classes:
• Direct: Knowing σ → knowing either r or L
• Stochastic: r or L are correlated with σ
Several types (usually all called “distance indicators”):
• Distance indicators: p(r |σ)
• Luminosity indicators: p(L|σ)
• Size indicators: p(D|σ)→ r via geometry
27 / 42
Direct Distance Indicator: Parallax
Distant objects
Target object
Earth's orbital motion
1 AU
Target apparent
motionParallax
angle
π
r
Parallax directly measures the distance to nearby
stars:
tanπ =1AU
r
→ ≡π
1 arcsec≈
1pc
r
r =1pc
“Parallax” is sometimes used as a synonym fordistance.
Similar geometric considerations→ orbits of mi-nor planets.
28 / 42
Direct Distance Indicator: Redshift
Redshift z lets you infer r via Hubble’s law (at low z):
z =λ− λ0
λ0=
vHub + vpec
c
→ cz = H0r + vpec
For z>∼1, must use the full luminosity distance-redshift law fromGR; depends on H0, Ωm, ΩΛ. . .
Complications:
• Peculiar velocity → “scatter” at low z
• Dependence on (uncertain) cosmology
• When inferred indirectly (“photo-z”) uncertainties may belarge/complex
29 / 42
Stochastic Luminosity Indicators
Measure a source property σ allowing statistical inference of L via:
p(L|σ) = gσ(L), hopefully narrow (<∼30% is good!)
Examples:
σ = Color/spectral type of star (H-R diagram, “photometric parallax”)
= Period & color of periodic variable star (Period-luminosity rel’n)
= Asymptotic rot’n velocity of spiral galaxy (Tully-Fisher)
= Velocity dispersion & angular size of elliptical galaxy
(Fundamental plane)
= Shape & color of SN Ia light curve
. . .
Can consider measurement of σ → estimate of L with“measurement error” (real σ msmt. error (noise) may compound this)
30 / 42
Inferential Goals
• Estimate shape of Σ(m) (no aux. info)
• Estimate characteristics r , L for each object
• Estimate gσ(L) (“calibration”)
• Estimate n(r) with gσ(L) known
• Estimate fL(L) for entire population
• Detect/estimate evolution, fL(L; r )
• Jointly estimate n(r), fL(L; r )
• Estimate cosmological parameters (n and fL are nuisances)
• . . .
33 / 42
Statistical issues for surveys
1 Observables, desirables & integralsBrightness distributionsDistance–luminosity distributions
2 Cornucopia of complicationsSelection effectsScatter distortions
3 Statistical methodology — A glimpse
34 / 42
Scatter Biases in Univariate Distributions
“Eddington Bias”
n
r
n
r
r Uncertainty
^
“A series of quantities are measured and classified in equal ranges.
Each measure has a known uncertainty. On account of the errors of
measurement some quantities are put into the wrong ranges. If the
true number in a range is greater than those in the adjacent ranges,
we should expect more observations to be scattered out of the range
than into it, so that the observed number will need a positive
correction.” (Jeffreys 1938)
35 / 42
Luminosity Calibration via Parallax“Lutz-Kelker Bias”
Source msmt. complications
• Parallax error (λ ≡ σ/)• Flux error• Transformation bias ( → r)• Density law (prior)
Selection complications
• Magnitude (flux)truncation/thinning
• Parallax censoring• Usually “soft” (random)
36 / 42
Distance Estimation via Luminosity Indicator
“Malmquist Bias”
Source msmt. complications
• Indicator scatter• Transformation bias• Flux error
Selection complications
• Magnitude (flux)truncation/thinning
• Usually “soft” (random)
37 / 42
Distance Errors Due to Indicator Scatter
Average true radius r ofsources assigned r
0 20 40 60 80 100 120 140 160
r
-0.2
-0.1
0.0
0.1
0.2
r/rδ
0 20 40 60 80 100 120 140 1600
20
40
60
80
100
120
140
160
r
38 / 42
Statistical issues for surveys
1 Observables, desirables & integralsBrightness distributionsDistance–luminosity distributions
2 Cornucopia of complicationsSelection effectsScatter distortions
3 Statistical methodology — A glimpse
39 / 42
Analyzing Surveys: Two Classes of Methods
F
z
Observables Measurements CatalogPopulation
SelectionObservationMapping
Space: !" # # #
= precise = uncertain
L
r
Indicator scatter & Transformation bias
Measurement Error
Truncation & Censoring
F
z
F
zχ
Inverse methods• Try to “correct” or “debias” data via adjustments/weights
• Focus on moments & empirical dist’n function (EDF)
Forward modeling methods• Try to predict data by applying survey process to model
• Focus on likelihood
(Analogous to “design-based” vs. “model-based” methods in survey sampling)
40 / 42
Seminal Work
Eddington (1913, 1940) & Jeffreys (1938)• Measurement error in univariate dist’ns (“density
deconvolution/demixing”)
• Adjusted EDF/estimates vs. likelihood (Eddington vs.Jeffreys)
Malmquist (1920)• Correct (r , L) dist’ns for truncation by adjusting moment
estimators, assuming uniform n and Gaussian Φ(M)
Lutz-Kelker (1973)• Correct parallax distances for scatter by adjusting moment
estimators, assuming uniform n and Gaussian Φ(M)
41 / 42
Recent Developments & Open Issues
Brightness distributions
X Parametric modelling (viamarginalizing latentvariables)
X Nonparametric estimateswith truncation/censoring;no meas. error
× Nonparametric estimateswith scatter andtruncation
(r , L) distributions
X Parametric f (L) givenn(r), or n(r) given f (L)
X As above, nonparametricwith no meas. error
× Everything else (includingext. to phase space)!
42 / 42