Psychosocial/ Medical Statistician/ Senior Medical Statistician
Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction •...
Transcript of Beyond Mean Regression - Universität Innsbruck€¦ · Thomas Kneib Introduction Introduction •...
Beyond Mean Regression
Thomas Kneib
Lehrstuhl fur StatistikGeorg-August-Universitat Gottingen
8.3.2013 Innsbruck
Thomas Kneib Introduction
Introduction
• One of the top ten reasons to become statistician (according to Friedman, Friedman& Amoo):
Statisticians are mean lovers.
⇒ Focus on means in particular in regression model to reduce complexity.
• Obviously, a mean is not sufficient to fully describe a distribution.
Beyond Mean Regression 1
Thomas Kneib Introduction
• Usual regression models based on data (yi,zi) for a continuous response variable yand covariates z:
yi = ηi + εi,
where ηi is a regression predictor formed in terms of the covariates zi.
• Assumptions on the error term:
E(εi) = 0, Var(εi) = σ2,
orεi ∼ N(0, σ2).
Beyond Mean Regression 2
Thomas Kneib Introduction
• The assumptions on the error term imply the following properties of the responsedistribution
– The predictor determines the expectation of the response:
E(yi|zi) = ηi.
– Homoscedasticity of the response:
Var(yi|zi) = σ2.
– Parallel quantile curves of the response (if the errors are also normal):
Qτ(yi|zi) = ηi + zτσ.
Beyond Mean Regression 3
Thomas Kneib Introduction
• Why could this be problematic?
– The variance of the responses may depend on covariates (heteroscedasticity).
– Other higher order characteristics (skewness, curtosis, . . . ) of the responses maydepend on covariates.
– Generic interest in extreme observations or the complete conditional distributionof the response.
Beyond Mean Regression 4
Thomas Kneib Introduction
• Example: Munich rental guide (illustrative application in this talk).
– Explain the net rent for a specific flat in terms of covariates such as living area oryear of construction.
– Published to give reference intervals of usual rents for both tenants and landlords.
⇒ We are not interested in average rents but rather in an interval covering typicalrents.
050
010
0015
0020
00re
nt in
Eur
o
20 40 60 80 100 120 140 160living area
050
010
0015
0020
00re
nt in
Eur
o
1920 1940 1960 1980 2000year of construction
Beyond Mean Regression 5
Thomas Kneib Introduction
• Some further examples:
– Analysing childhood BMI patterns in (post-) industrialized countries, where interestis mainly on extreme forms of overweight (obesity).
– Studying covariate effects on extreme forms of malnutrition in developing countries.
– Efficiency estimation in agricultural production, where interest is on evaluatingabove-average performance of farms.
– Modelling gas flow networks, where the behavior of the network in high or lowdemand situations shall be studied.
Beyond Mean Regression 6
Thomas Kneib Introduction
• More flexible regression approaches considered in the following:
– Regression models for location, scale and shape.
– Quantile regression.
– Expectile regression.
Beyond Mean Regression 7
Thomas Kneib Introduction
• Regression models for location, scale and shape:
– Retain the assumption of a specific error distribution but allow covariate effectsnot only on the mean.
– Simplest example: Regression for mean and variance of a normal distribution where
yi = ηi1 + exp(ηi2)εi, εi ∼ N(0, 1),
such thatE(yi|zi) = ηi1 Var(yi|zi) = exp(ηi2)
2.
– In general: Specify a distribution for the response, where (potentially) allparameters are related to predictors.
Beyond Mean Regression 8
Thomas Kneib Introduction
• Quantile and expectile regression:
– Drop the parametric assumption for the error / response distribution and insteadestimate separate models for different asymmetries τ ∈ [0, 1]:
yi = ηiτ + εiτ ,
– Instead of assuming E(εiτ) = 0, we can for example assume
P (εiτ ≤ 0) = τ ,
i.e. the τ -quantile of the error term is zero.
– Yields a regression model for the quantiles of the response.
– A dense set of quantiles completely characterizes the conditional distribution ofthe response.
– Expectiles are a computationally attractive alternative to quantiles.
Beyond Mean Regression 9
Thomas Kneib Introduction
• Estimated quantile curves for the Munich rental guide with linear effect of living areaand quadratic effect for year of construction.
– Homoscedastic linear model:
−50
00
500
1000
1500
2000
rent
in E
uro
20 40 60 80 100 120 140 160living area
050
010
0015
0020
00re
nt in
Eur
o
1920 1940 1960 1980 2000year of construction
Beyond Mean Regression 10
Thomas Kneib Introduction
– Heteroscedastic linear model:
−50
00
500
1000
1500
2000
rent
in E
uro
20 40 60 80 100 120 140 160living area
050
010
0015
0020
00re
nt in
Eur
o
1920 1940 1960 1980 2000year of construction
Beyond Mean Regression 11
Thomas Kneib Introduction
– Quantile regression:
−50
00
500
1000
1500
2000
rent
in E
uro
20 40 60 80 100 120 140 160living area
050
010
0015
0020
00re
nt in
Eur
o
1920 1940 1960 1980 2000year of construction
Beyond Mean Regression 12
Thomas Kneib Introduction
• Usually, modern regression data contain more complex structures such that linearpredictors are not enough.
• For example, in the Munich rental guide
– the effects of living area and size of the flat may be of complex nonlinear form(instead of simply polynomial) and
– a spatial effect based on the subquarter information may be included to captureeffects of missing covariates and spatial correlation.
⇒ Consider semiparametric extensions.
Beyond Mean Regression 13
Thomas Kneib Overview for the Rest of the Talk
Overview for the Rest of the Talk
• Semiparametric Predictor Specifications.
• More on Models:
– Generalized Additive Models for Location, Scale and Shape.
– Quantile Regression.
– Expectile Regression.
• Inferential Procedures & Comparison of the Approaches.
Beyond Mean Regression 14
Thomas Kneib Semiparametric Regression
Semiparametric Regression
• Semiparametric regression provides a generic framework for flexible regression modelswith predictor
η = β0 + f1(z) + . . .+ fr(z)
where f1, . . . , fr are generic functions of the covariate vector z.
• Types of effects:
– Linear effects: f(z) = x′β.
– Nonlinear, smooth effects of continuous covariates: f(z) = f(x).
– Varying coefficients: f(z) = uf(x).
– Interaction surfaces: f(z) = f(x1, x2).
– Spatial effects: f(z) = fspat(s).
– Random effects: f(z) = bc with cluster index c.
Beyond Mean Regression 15
Thomas Kneib Semiparametric Regression
• Generic model description based on
– a design matrix Zj, such that the vector of function evaluations f j = (fj(z1),. . . , fj(zn))
′ can be written as
f j = Zjγj.
– a quadratic penalty term
pen(fj) = pen(γj) = γ′jKjγj
which operationalises smoothness properties of fj.
• From a Bayesian perspective, the penalty term corresponds to a multivariate Gaussianprior
p(γj) ∝ exp
(− 1
2δ2jγ′jKjγj
).
Beyond Mean Regression 16
Thomas Kneib Semiparametric Regression
• Estimation then relies on a penalised fit criterion, e.g.
n∑i=1
(yi − ηi)2 +
r∑j=1
λjγ′jKjγj
with smoothing parameters λj ≥ 0.
Beyond Mean Regression 17
Thomas Kneib Semiparametric Regression
• Example 1. Penalised splines for nonlinear effects f(x):
– Approximate f(x) in terms of a linear combination of B-spline basis functions
f(x) =∑k
γkBk(x).
– Large variability in the estimates corresponds to large differences in adjacentcoefficients yielding the penalty term
pen(γ) =∑k
(∆dγk)2 = γ′D′
dDdγ
with difference operator ∆d and difference matrix Dd of order d.
– The corresponding Bayesian prior is a random walk of order d, e.g.
γk = γk−1 + uk, γk = 2γk−1 + γk−2 + uk
with uk i. i. d. N(0, δ2).
Beyond Mean Regression 18
Thomas Kneib Semiparametric Regression
Beyond Mean Regression 19
Thomas Kneib Semiparametric Regression
• Example 2. Markov random fields for the estimation of spatial effects based onregional data:
– Estimate a separate regression coefficient γs for each region, i.e. f = Zγ with
Z[i, s] =
{1 observation i belongs to region s
0 otherwise
– Penalty term based on differences of neighboring regions:
pen(γ) =∑s
∑r∈N(s)
(γs − γr)2 = γ′Kγ
where N(s) is the set of neighbors of region s and K is an adjacency matrix.
– An equivalent Bayesian prior structure is obtained based on Gaussian Markovrandom fields.
Beyond Mean Regression 20
Thomas Kneib Inferential Procedures
Inferential Procedures
• For each of the three model classes discussed in the following, we will consider threepotential avenues for inference:
– Direct optimization of a fit criterion (e.g. maximum likelihood estimation forGAMLSS).
– Bayesian approaches.
– Functional gradient descent boosting.
Beyond Mean Regression 21
Thomas Kneib Inferential Procedures
• Functional gradient descent boosting:
– Define the estimation problem in terms of a loss function ρ (e.g. the negativelog-likelihood).
– Use the negative gradients of the loss function evaluated at the current fit as ameasure for lack of fit.
– Iteratively fit simple base-learning procedures to the negative gradients to updatethe model fit.
– Componentwise updates of only the best-fitting model component yield automaticvariable selection and model choice.
– For semiparametric regression, penalized least squares estimates provide suitablebase-learners.
Beyond Mean Regression 22
Thomas Kneib Generalized Additive Models for Location, Scale and Shape
Generalized Additive Models for Location, Scale and Shape
• GAMLSS provide a unified framework for semiparametric regression models in the caseof complex response distributions depending on up to four parameters (µi, σi, νi, ξi)where usually
– µi is the location parameter,
– σi is the scale parameter, and
– νi and ξi are shape parameters determining for example skewness or kurtosis.
• Each parameter is related to a regression predictor via a suitable response function,i.e.
µi = h1(ηi,µ), σi = h2(ηi,σ), . . .
Beyond Mean Regression 23
Thomas Kneib Generalized Additive Models for Location, Scale and Shape
• A very broad class of distributions is supported for both discrete and continuousresponses.
• Most important examples for continuous responses:
– Two-parameter normal distribution (location and scale).
– Three-parameter power exponential distribution (location, scale and kurtosis).
– Three-parameter t distribution (location, scale and degrees of freedom).
– Three-parameter gamma distribution (location, scale and shape).
– Four-parameter Box-Cox power distribution (location, scale, skewness andkurtosis).
Beyond Mean Regression 24
Thomas Kneib Generalized Additive Models for Location, Scale and Shape
• Direct optimization:
– For GAMLSS, the likelihood is available due to the explicit assumption made forthe distribution of the response.
– Maximization can be achieved by penalized iteratively weighted least squares(IWLS) estimation.
– Estimation and choice of the smoothing parameters is challenging at least forcomplex models.
• Bayesian inference:
– Inference based on Markov chain Monte Carlo (MCMC) simulations is in principlestraightforward but requires careful choice of the proposal densities.
– Promising results obtained based on IWLS proposals.
– Smoothing parameter choice is immediately included.
Beyond Mean Regression 25
Thomas Kneib Generalized Additive Models for Location, Scale and Shape
• Boosting:
– Due to the multiple predictors, the usual boosting framework has to be adaptedbut basically still works.
Beyond Mean Regression 26
Thomas Kneib Generalized Additive Models for Location, Scale and Shape
• Results for the Munich rental guide obtained with an additive model for location andscale:
20 40 60 80 100 120 140 160
−20
00
200
400
600
mean: area
area in sqm
1920 1940 1960 1980 2000
−50
050
100
150
mean: year of construction
year of construction
Beyond Mean Regression 27
Thomas Kneib Generalized Additive Models for Location, Scale and Shape
20 40 60 80 100 120 140 160
−0.
50.
00.
51.
0
standard dev.: area
area in sqm
1920 1940 1960 1980 2000
−0.
2−
0.1
0.0
0.1
0.2
0.3
standard dev.: year of construction
year of construction
Beyond Mean Regression 28
Thomas Kneib Quantile Regression
Quantile Regression
• The theoretical τ -quantile qτ for a continuous random variable is characterized by
P (Y ≤ qτ) ≥ τ and P (Y ≥ qτ) ≥ 1− τ.
• Estimation of quantiles based on i.i.d. samples y1, . . . , yn can be accomplished by
qτ = argminq
n∑i=1
wτ(yi, q)|yi − q|
with asymmetric weights
wτ(yi, q) =
1− τ yi < q
0 yi = q
τ yi > q.
Beyond Mean Regression 29
Thomas Kneib Quantile Regression
• Plot of the weighted losses wτ(y, q)|y − q| (for q = 0)
Beyond Mean Regression 30
Thomas Kneib Quantile Regression
• Quantile regression starts with the regression model
yi = ηiτ + εiτ .
• Instead of assuming E(εiτ) = 0 as in mean regression, we assume
Fεiτ(0) = P (εiτ ≤ 0) = τ
i.e. the τ -quantile of the error is zero.
• This implies that the predictor coincides with the τ -quantile of the conditionaldistribution of the response, i.e.
Fyi(ηiτ) = P (yi ≤ ηiτ) = τ.
Beyond Mean Regression 31
Thomas Kneib Quantile Regression
• Quantile regression therefore
– is distribution-free since it does not make any specific assumptions on the type oferrors.
– does not even require i.i.d. errors.
– allows for heteroscedasticity.
Beyond Mean Regression 32
Thomas Kneib Quantile Regression
• Note that each parametric regression models also induces a quantile regression model.
• Example: The heteroscedastic normal model
y ∼ N(η1, exp(η2)2)
yieldsqτ = η1 + exp(η2)zτ .
Beyond Mean Regression 33
Thomas Kneib Quantile Regression
• Direct optimisation:
– Classical estimation is achieved by minimizing
n∑i=1
wτ(yi, ηiτ)|yi − ηiτ |+p∑
j=1
λjpen(fj).
– Can be solved with linear programming as long as the penalties are also linearfunctionals, e.g. for total variation penalization
pen(fj) =
∫|f ′′
j (x)|dx.
– Does not fit well with the class of quadratic penalties we are considering.
– Smoothing parameter selection is still challenging in particular with multiplesmoothing parameters.
Beyond Mean Regression 34
Thomas Kneib Quantile Regression
• Bayesian inference
– Although quantile regression is distribution-free, there is an auxiliary errordistribution that links ML estimation to quantile regression.
– Assume an asymmetric Laplace distribution for the responses, i.e.
yi ∼ ALD(ηiτ , σ2, τ)
with density
exp
(−wτ(yi, ηiτ)
|yi − ηiτ |σ2
).
– Maximizing the resulting likelihood
exp
(−
n∑i=1
wτ(yi, ηiτ)|yi − ηiτ |
σ2
)
is equivalent to minimizing the quantile loss criterion.
Beyond Mean Regression 35
Thomas Kneib Quantile Regression
– A computationally attractive way of working with the ALD in a Bayesian frameworkis its scale-mixture representation
– If zi |σ2 ∼ Exp(1/σ2) and
yi | zi, ηiτ , σ2 ∼ N(ηiτ + ξzi, σ2/wi)
with
ξ =1− 2τ
τ(1− τ), wi =
1
δ2zi, δ2 =
2
τ(1− τ).
then yi is marginally ALD(ηiτ , σ2, τ) distributed.
– Allows to construct efficient Gibbs samplers or variational Bayes approximations toexplore the posterior after imputing zi as additional unknowns.
Beyond Mean Regression 36
Thomas Kneib Quantile Regression
• Boosting:
– Boosting can be immediately applied in the quantile regression context since it isformulated in terms of a loss function.
– Negative gradients are defined almost everywhere, i.e. no conceptual problems.
Beyond Mean Regression 37
Thomas Kneib Quantile Regression
• Results for a geoadditive Bayesian quantile regression model:
τ=0.1
−150 1500
τ=0.2
−150 1500
τ=0.5
−150 1500
τ=0.9
−150 1500
Beyond Mean Regression 38
Thomas Kneib Quantile Regression
living area
f( li
ving
are
a )
−50
00
500
1000
20 40 60 80 100 120 140 160
year of construction
f( y
ear
of c
onst
ruct
ion
)
−10
00
5015
0
1920 1940 1960 1980 2000
living area
f( li
ving
are
a )
−50
00
500
1000
20 40 60 80 100 120 140 160
year of construction
f( y
ear
of c
onst
ruct
ion
)
−10
00
5015
0
1920 1940 1960 1980 2000
living area
f( li
ving
are
a )
−50
00
500
1000
20 40 60 80 100 120 140 160
year of construction
f( y
ear
of c
onst
ruct
ion
)
−10
00
5015
0
1920 1940 1960 1980 2000
Beyond Mean Regression 39
Thomas Kneib Expectile Regression
Expectile Regression
• What is expectile regression?
n∑i=1
|yi − ηi| → min
median regression
n∑i=1
|yi − ηi|2 → min
mean regression
n∑i=1
wτ(yi, ηiτ)|yi − ηiτ | → min
quantile regression
??
expectile regression
Beyond Mean Regression 40
Thomas Kneib Expectile Regression
Expectile Regression
• What is expectile regression?
n∑i=1
|yi − ηi| → min
median regression
n∑i=1
|yi − ηi|2 → min
mean regression
n∑i=1
wτ(yi, ηiτ)|yi − ηiτ | → min
quantile regression
n∑i=1
wτ(yi, ηiτ)|yi − ηiτ |2 → min
expectile regression
Beyond Mean Regression 41
Thomas Kneib Expectile Regression
• Theoretical expectiles are obtained by solving
τ =
∫ eτ−∞ |y − eτ |fy(y)dy∫∞−∞ |y − eτ |fy(y)dy
=Gy(eτ)− eτFy(eτ)
2(Gy(eτ)− eτFy(eτ)) + (eτ − µ)
where
– fy(·) and Fy(·) denote the density and cumulative distribution function of y,
– Gy(e) =∫ e
−∞ yfy(y)dy is the partial moment function of y and
– Gy(∞) = µ is the expectation of y.
Beyond Mean Regression 42
Thomas Kneib Expectile Regression
• Direct optimization:
– Since the expectile loss is differentiable, estimates for the basis coefficients can beobtained by iterating
γ[t+1]jτ = (Z ′
jW[t]τ Zj + λjKj)
−1Z ′jW
[t]τ y.
– A combination with mixed model methodology allows to estimate the smoothingparameters.
Beyond Mean Regression 43
Thomas Kneib Expectile Regression
• Bayesian inference:
– Similarly as for quantile regression, an asymmetric normal distribution can bedefined as auxiliary distribution for the responses.
– No scale mixture representation known so far.
– Bayesian formulation probably less important since inference is directly tractable.
• Boosting:
– Boosting can be immediately applied in the expectile regression context.
Beyond Mean Regression 44
Thomas Kneib Comparison
Comparison
• Advantages of GAMLSS:
– One joint model for the distribution of the responses.
– Interpretability of the estimated effects in terms of parameters of the responsedistribution.
– Quantiles (or expectiles) derived from GAMLSS will always be coherent, i.e.ordering will be preserved.
– Readily available in both frequentist and Bayesian formulation.
• Disadvantages of GAMLSS:
– Potential for misspecification of the observation model.
– Model checking difficult in complex settings.
– If quantiles are of ultimate interest, GAMLSS do not provide direct estimates forthese.
Beyond Mean Regression 45
Thomas Kneib Comparison
• Advantages of quantile regression:
– Completely distribution-free approach.
– Easy interpretation in terms of conditional quantiles.
– Bayesian formulation enables very flexible, fully data-driven semiparametricspecifications of the predictor.
• Disadvantages of quantile regression:
– Bayesian formulation requires an auxiliary error distribution (that will usually be amisspecification).
– Estimated cumulative distribution function is a step function even for continuousdata.
– Additional efforts required to avoid crossing of quantile curves.
Beyond Mean Regression 46
Thomas Kneib Comparison
• Advantages of expectile regression:
– Computationally simple (iteratively weighted least squares).
– Still allows to characterize the complete conditional distribution of the response.
– Quantiles (or conditional distributions) can be computed based on expectiles.
– Expectiles seem to be more efficient in close-to-Gaussian situations then quantiles.
– Expectile crossing seems to be less of an issue as compared to quantile crossing.
– The estimated expectile curve is smooth.
• Disadvantages of expectile regression:
– Immediate interpretation of expectiles is difficult.
Beyond Mean Regression 47
Thomas Kneib Summary
Summary
• There is more than mean regression!
• Semiparametric extensions become available also for models beyond mean regression.
• You can do this at home:
– Quantile regression: R-package quantreg.
– Bayesian quantile regression: BayesX (MCMC) and forthcoming R-package onvariational Bayes approximations (VA).
– GAMLSS: R-packages gamlss and gamboostLSS.
– Expectile regression: R-package expectreg.
• Interesting addition to the models considered: Modal regression (yet to be explored).
Beyond Mean Regression 48
Thomas Kneib Summary
• Acknowledgements:
– This talk is mostly based on joint work with Nora Fenske, Benjamin Hofner, TorstenHothorn, Goran Kauermann, Stefan Lang, Andreas Mayr, Matthias Schmid, LindaSchulze Waltrup, Fabian Sobotka, Elisabeth Waldmann and Yu Yue.
– Financial support has been provided by the German Research Foundation (DFG).
• A place called home:
http://www.statoek.wiso.uni-goettingen.de
Beyond Mean Regression 49