ABC short course: final chapters
-
Upload
christian-robert -
Category
Science
-
view
2.364 -
download
0
Transcript of ABC short course: final chapters
![Page 1: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/1.jpg)
ABC model choice via random forests
1 simulation-based methods inEconometrics
2 Genetics of ABC
3 Approximate Bayesian computation
4 ABC for model choice
5 ABC model choice via random forestsRandom forestsABC with random forestsIllustrations
6 ABC estimation via random forests
7 [some] asymptotics of ABC
![Page 2: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/2.jpg)
Leaning towards machine learning
Main notions:
• ABC-MC seen as learning about which model is mostappropriate from a huge (reference) table
• exploiting a large number of summary statistics not an issuefor machine learning methods intended to estimate efficientcombinations
• abandoning (temporarily?) the idea of estimating posteriorprobabilities of the models, poorly approximated by machinelearning methods, and replacing those by posterior predictiveexpected loss
[Cornuet et al., 2014, in progress]
![Page 3: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/3.jpg)
Random forests
Technique that stemmed from Leo Breiman’s bagging (orbootstrap aggregating) machine learning algorithm for bothclassification and regression
[Breiman, 1996]
Improved classification performances by averaging overclassification schemes of randomly generated training sets, creatinga “forest” of (CART) decision trees, inspired by Amit and Geman(1997) ensemble learning
[Breiman, 2001]
![Page 4: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/4.jpg)
Growing the forest
Breiman’s solution for inducing random features in the trees of theforest:
• boostrap resampling of the dataset and
• random subset-ing [of size√
t] of the covariates driving theclassification at every node of each tree
Covariate xτ that drives the node separation
xτ ≷ cτ
and the separation bound cτ chosen by minimising entropy or Giniindex
![Page 5: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/5.jpg)
Breiman and Cutler’s algorithm
Algorithm 5 Random forests
for t = 1 to T do//*T is the number of trees*//Draw a bootstrap sample of size nboot
Grow an unpruned decision treefor b = 1 to B do
//*B is the number of nodes*//Select ntry of the predictors at randomDetermine the best split from among those predictors
end forend forPredict new data by aggregating the predictions of the T trees
![Page 6: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/6.jpg)
Subsampling
Due to both large datasets [practical] and theoreticalrecommendation from Gerard Biau [private communication], fromindependence between trees to convergence issues, boostrapsample of much smaller size than original data size
N = o(n)
Each CART tree stops when number of observations per node is 1:no culling of the branches
![Page 7: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/7.jpg)
Subsampling
Due to both large datasets [practical] and theoreticalrecommendation from Gerard Biau [private communication], fromindependence between trees to convergence issues, boostrapsample of much smaller size than original data size
N = o(n)
Each CART tree stops when number of observations per node is 1:no culling of the branches
![Page 8: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/8.jpg)
ABC with random forests
Idea: Starting with
• possibly large collection of summary statistics (s1i , . . . , spi )(from scientific theory input to available statistical softwares,to machine-learning alternatives)
• ABC reference table involving model index, parameter valuesand summary statistics for the associated simulatedpseudo-data
run R randomforest to infer M from (s1i , . . . , spi )
![Page 9: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/9.jpg)
ABC with random forests
Idea: Starting with
• possibly large collection of summary statistics (s1i , . . . , spi )(from scientific theory input to available statistical softwares,to machine-learning alternatives)
• ABC reference table involving model index, parameter valuesand summary statistics for the associated simulatedpseudo-data
run R randomforest to infer M from (s1i , . . . , spi )
at each step O(√
p) indices sampled at random and mostdiscriminating statistic selected, by minimising entropy Gini loss
![Page 10: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/10.jpg)
ABC with random forests
Idea: Starting with
• possibly large collection of summary statistics (s1i , . . . , spi )(from scientific theory input to available statistical softwares,to machine-learning alternatives)
• ABC reference table involving model index, parameter valuesand summary statistics for the associated simulatedpseudo-data
run R randomforest to infer M from (s1i , . . . , spi )
Average of the trees is resulting summary statistics, highlynon-linear predictor of the model index
![Page 11: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/11.jpg)
Outcome of ABC-RF
Random forest predicts a (MAP) model index, from the observeddataset: The predictor provided by the forest is “sufficient” toselect the most likely model but not to derive associated posteriorprobability
• exploit entire forest by computing how many trees lead topicking each of the models under comparison but variabilitytoo high to be trusted
• frequency of trees associated with majority model is no propersubstitute to the true posterior probability
• And usual ABC-MC approximation equally highly variable andhard to assess
![Page 12: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/12.jpg)
Outcome of ABC-RF
Random forest predicts a (MAP) model index, from the observeddataset: The predictor provided by the forest is “sufficient” toselect the most likely model but not to derive associated posteriorprobability
• exploit entire forest by computing how many trees lead topicking each of the models under comparison but variabilitytoo high to be trusted
• frequency of trees associated with majority model is no propersubstitute to the true posterior probability
• And usual ABC-MC approximation equally highly variable andhard to assess
![Page 13: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/13.jpg)
Posterior predictive expected losses
We suggest replacing unstable approximation of
P(M = m|xo)
with xo observed sample and m model index, by average of theselection errors across all models given the data xo ,
P(M(X ) 6= M|xo)
where pair (M,X ) generated from the predictive
∫f (x |θ)π(θ,M|xo)dθ
and M(x) denotes the random forest model (MAP) predictor
![Page 14: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/14.jpg)
Posterior predictive expected losses
Arguments:
• Bayesian estimate of the posterior error
• integrates error over most likely part of the parameter space
• gives an averaged error rather than the posterior probability ofthe null hypothesis
• easily computed: Given ABC subsample of parameters fromreference table, simulate pseudo-samples associated withthose and derive error frequency
![Page 15: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/15.jpg)
toy: MA(1) vs. MA(2)
Comparing an MA(1) and an MA(2) models:
xt = εt − ϑ1εt−1[−ϑ2εt−2]
Earlier illustration using first two autocorrelations as S(x)[Marin et al., Stat. & Comp., 2011]
Result #1: values of p(m|x) [obtained by numerical integration]and p(m|S(x)) [obtained by mixing ABC outcome and densityestimation] highly differ!
![Page 16: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/16.jpg)
toy: MA(1) vs. MA(2)
Difference between the posterior probability of MA(2) given eitherx or S(x). Blue stands for data from MA(1), orange for data fromMA(2)
![Page 17: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/17.jpg)
toy: MA(1) vs. MA(2)
Comparing an MA(1) and an MA(2) models:
xt = εt − ϑ1εt−1[−ϑ2εt−2]
Earlier illustration using two autocorrelations as S(x)[Marin et al., Stat. & Comp., 2011]
Result #2: Embedded models, with simulations from MA(1)within those from MA(2), hence linear classification poor
![Page 18: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/18.jpg)
toy: MA(1) vs. MA(2)
Simulations of S(x) under MA(1) (blue) and MA(2) (orange)
![Page 19: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/19.jpg)
toy: MA(1) vs. MA(2)
Comparing an MA(1) and an MA(2) models:
xt = εt − ϑ1εt−1[−ϑ2εt−2]
Earlier illustration using two autocorrelations as S(x)[Marin et al., Stat. & Comp., 2011]
Result #3: On such a small dimension problem, random forestsshould come second to k-nn ou kernel discriminant analyses
![Page 20: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/20.jpg)
toy: MA(1) vs. MA(2)
classification priormethod error rate (in %)
LDA 27.43Logist. reg. 28.34SVM (library e1071) 17.17“naıve” Bayes (with G marg.) 19.52“naıve” Bayes (with NP marg.) 18.25ABC k-nn (k = 100) 17.23ABC k-nn (k = 50) 16.97Local log. reg. (k = 1000) 16.82Random Forest 17.04Kernel disc. ana. (KDA) 16.95True MAP 12.36
![Page 21: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/21.jpg)
Evolution scenarios based on SNPs
Three scenarios for the evolution of three populations from theirmost common ancestor
![Page 22: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/22.jpg)
Evolution scenarios based on SNPs
DIYBAC header (!)
7 parameters and 48 summary statistics
3 scenarios: 7 7 7
scenario 1 [0.33333] (6)
N1 N2 N3
0 sample 1
0 sample 2
0 sample 3
ta merge 1 3
ts merge 1 2
ts varne 1 N4
scenario 2 [0.33333] (6)
N1 N2 N3
..........
ts varne 1 N4
scenario 3 [0.33333] (7)
N1 N2 N3
........
historical parameters priors (7,1)
N1 N UN[100.0,30000.0,0.0,0.0]
N2 N UN[100.0,30000.0,0.0,0.0]
N3 N UN[100.0,30000.0,0.0,0.0]
ta T UN[10.0,30000.0,0.0,0.0]
ts T UN[10.0,30000.0,0.0,0.0]
N4 N UN[100.0,30000.0,0.0,0.0]
r A UN[0.05,0.95,0.0,0.0]
ts>ta
DRAW UNTIL
![Page 23: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/23.jpg)
Evolution scenarios based on SNPs
Model 1 with 6 parameters:
• four effective sample sizes: N1 for population 1, N2 forpopulation 2, N3 for population 3 and, finally, N4 for thenative population;
• the time of divergence ta between populations 1 and 3;
• the time of divergence ts between populations 1 and 2.
• effective sample sizes with independent uniform priors on[100, 30000]
• vector of divergence times (ta, ts) with uniform prior on{(a, s) ∈ [10, 30000]⊗ [10, 30000]|a < s}
![Page 24: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/24.jpg)
Evolution scenarios based on SNPs
Model 2 with same parameters as model 1 but the divergence timeta corresponds to a divergence between populations 2 and 3; priordistributions identical to those of model 1Model 3 with extra seventh parameter, admixture rate r . For thatscenario, at time ta admixture between populations 1 and 2 fromwhich population 3 emerges. Prior distribution on r uniform on[0.05, 0.95]. In that case models 1 and 2 are not embeddeded inmodel 3. Prior distributions for other parameters the same as inmodel 1
![Page 25: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/25.jpg)
Evolution scenarios based on SNPs
Set of 48 summary statistics:Single sample statistics
• proportion of loci with null gene diversity (= proportion of monomorphicloci)
• mean gene diversity across polymorphic loci[Nei, 1987]
• variance of gene diversity across polymorphic loci
• mean gene diversity across all loci
![Page 26: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/26.jpg)
Evolution scenarios based on SNPs
Set of 48 summary statistics:Two sample statistics
• proportion of loci with null FST distance between both samples[Weir and Cockerham, 1984]
• mean across loci of non null FST distances between both samples
• variance across loci of non null FST distances between both samples
• mean across loci of FST distances between both samples
• proportion of 1 loci with null Nei’s distance between both samples[Nei, 1972]
• mean across loci of non null Nei’s distances between both samples
• variance across loci of non null Nei’s distances between both samples
• mean across loci of Nei’s distances between the two samples
![Page 27: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/27.jpg)
Evolution scenarios based on SNPs
Set of 48 summary statistics:Three sample statistics
• proportion of loci with null admixture estimate
• mean across loci of non null admixture estimate
• variance across loci of non null admixture estimated
• mean across all locus admixture estimates
![Page 28: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/28.jpg)
Evolution scenarios based on SNPs
For a sample of 1000 SNIPs measured on 25 biallelic individualsper population, learning ABC reference table with 20, 000simulations, prior predictive error rates:
• “naıve Bayes” classifier 33.3%
• raw LDA classifier 23.27%
• ABC k-nn [Euclidean dist. on summaries normalised by MAD]25.93%
• ABC k-nn [unnormalised Euclidean dist. on LDA components]22.12%
• local logistic classifier based on LDA components with• k = 500 neighbours 22.61%
• random forest on summaries 21.03%
(Error rates computed on a prior sample of size 104)
![Page 29: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/29.jpg)
Evolution scenarios based on SNPs
For a sample of 1000 SNIPs measured on 25 biallelic individualsper population, learning ABC reference table with 20, 000simulations, prior predictive error rates:
• “naıve Bayes” classifier 33.3%
• raw LDA classifier 23.27%
• ABC k-nn [Euclidean dist. on summaries normalised by MAD]25.93%
• ABC k-nn [unnormalised Euclidean dist. on LDA components]22.12%
• local logistic classifier based on LDA components with• k = 1000 neighbours 22.46%
• random forest on summaries 21.03%
(Error rates computed on a prior sample of size 104)
![Page 30: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/30.jpg)
Evolution scenarios based on SNPs
For a sample of 1000 SNIPs measured on 25 biallelic individualsper population, learning ABC reference table with 20, 000simulations, prior predictive error rates:
• “naıve Bayes” classifier 33.3%
• raw LDA classifier 23.27%
• ABC k-nn [Euclidean dist. on summaries normalised by MAD]25.93%
• ABC k-nn [unnormalised Euclidean dist. on LDA components]22.12%
• local logistic classifier based on LDA components with• k = 5000 neighbours 22.43%
• random forest on summaries 21.03%
(Error rates computed on a prior sample of size 104)
![Page 31: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/31.jpg)
Evolution scenarios based on SNPs
For a sample of 1000 SNIPs measured on 25 biallelic individualsper population, learning ABC reference table with 20, 000simulations, prior predictive error rates:
• “naıve Bayes” classifier 33.3%
• raw LDA classifier 23.27%
• ABC k-nn [Euclidean dist. on summaries normalised by MAD]25.93%
• ABC k-nn [unnormalised Euclidean dist. on LDA components]22.12%
• local logistic classifier based on LDA components with• k = 5000 neighbours 22.43%
• random forest on LDA components only 23.1%
(Error rates computed on a prior sample of size 104)
![Page 32: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/32.jpg)
Evolution scenarios based on SNPs
For a sample of 1000 SNIPs measured on 25 biallelic individualsper population, learning ABC reference table with 20, 000simulations, prior predictive error rates:
• “naıve Bayes” classifier 33.3%
• raw LDA classifier 23.27%
• ABC k-nn [Euclidean dist. on summaries normalised by MAD]25.93%
• ABC k-nn [unnormalised Euclidean dist. on LDA components]22.12%
• local logistic classifier based on LDA components with• k = 5000 neighbours 22.43%
• random forest on summaries and LDA components 19.03%
(Error rates computed on a prior sample of size 104)
![Page 33: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/33.jpg)
Evolution scenarios based on SNPs
Posterior predictive error rates
![Page 34: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/34.jpg)
Evolution scenarios based on SNPs
Posterior predictive error rates
favourable: 0.010 error – unfavourable: 0.104 error
![Page 35: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/35.jpg)
Evolution scenarios based on microsatellites
Same setting as previously
Sample of 25 diploid individuals per population, on 20 locus(roughly corresponds to 1/5th of previous information)
![Page 36: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/36.jpg)
Evolution scenarios based on microsatellites
One sample statistics
• mean number of alleles across loci
• mean gene diversity across loci (Nei, 1987)
• mean allele size variance across loci
• mean M index across loci (Garza and Williamson, 2001;Excoffier et al., 2005)
![Page 37: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/37.jpg)
Evolution scenarios based on microsatellites
Two sample statistics
• mean number of alleles across loci (two samples)
• mean gene diversity across loci (two samples)
• mean allele size variance across loci (two samples)
• FST between two samples (Weir and Cockerham, 1984)
• mean index of classification (two samples) (Rannala andMoutain, 1997; Pascual et al., 2007)
• shared allele distance between two samples (Chakraborty andJin, 1993)
• (δµ)2 distance between two samples (Golstein et al., 1995)
Three sample statistics
• Maximum likelihood coefficient of admixture (Choisy et al.,2004)
![Page 38: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/38.jpg)
Evolution scenarios based on microsatellites
classification prior error∗
method rate (in %)
raw LDA 35.64“naıve” Bayes (with G marginals) 40.02k-nn (MAD normalised sum stat) 37.47k-nn (unormalised LDA) 35.14RF without LDA components 35.14RF with LDA components 33.62RF with only LDA components 37.25
∗estimated on pseudo-samples of 104 items drawn from the prior
![Page 39: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/39.jpg)
Evolution scenarios based on microsatellites
Posterior predictive error rates
![Page 40: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/40.jpg)
Evolution scenarios based on microsatellites
Posterior predictive error rates
favourable: 0.183 error – unfavourable: 0.435 error
![Page 41: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/41.jpg)
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion beetle moves
![Page 42: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/42.jpg)
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion beetle moves
classification prior error†
method rate (in %)
raw LDA 38.94“naıve” Bayes (with G margins) 54.02
k-nn (MAD normalised sum stat) 58.47RF without LDA components 38.84
RF with LDA components 35.32
†estimated on pseudo-samples of 104 items drawn from the prior
![Page 43: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/43.jpg)
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion beetle moves
Random forest allocation frequencies
1 2 3 4 5 6 7 8 9 10
0.168 0.1 0.008 0.066 0.296 0.016 0.092 0.04 0.014 0.2
Posterior predictive error based on 20,000 prior simulations andkeeping 500 neighbours (or 100 neighbours and 10 pseudo-datasetsper parameter)
0.3682
![Page 44: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/44.jpg)
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion
![Page 45: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/45.jpg)
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion
![Page 46: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/46.jpg)
Back to Asian Ladybirds [message in a beetle]
Comparing 10 scenarios of Asian beetle invasion
0 500 1000 1500 2000
0.40
0.45
0.50
0.55
0.60
0.65
k
erro
r
posterior predictive error 0.368
![Page 47: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/47.jpg)
conclusion on random forests
• unlimited aggregation of arbitrary summary statistics
• recovery of discriminant statistics when available
• automated implementation with reduced calibration
• self-evaluation by posterior predictive error
• soon to appear in DIYABC
![Page 48: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/48.jpg)
ABC estimation via random forests
1 simulation-based methods inEconometrics
2 Genetics of ABC
3 Approximate Bayesian computation
4 ABC for model choice
5 ABC model choice via random forests
6 ABC estimation via random forests
7 [some] asymptotics of ABC
![Page 49: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/49.jpg)
Two basic issues with ABC
ABC compares numerous simulated dataset to the observed one
Two major difficulties:
• to decrease approximation error (or tolerance ε) and henceensure reliability of ABC, total number of simulations verylarge;
• calibration of ABC (tolerance, distance, summary statistics,post-processing, &tc) critical and hard to automatise
![Page 50: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/50.jpg)
classification of summaries by random forests
Given a large collection of summary statistics, rather than selectinga subset and excluding the others, estimate each parameter ofinterest by a machine learning tool like random forests
• RF can handle thousands of predictors
• ignore useless components
• fast estimation method with good local properties
• automatised method with few calibration steps
• substitute to Fearnhead and Prangle (2012) preliminaryestimation of θ(y obs)
• includes a natural (classification) distance measure that avoidschoice of both distance and tolerance
[Marin et al., 2016]
![Page 51: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/51.jpg)
random forests as non-parametric regression
CART means Classification and Regression Trees
For regression purposes, i.e., to predict y as f (x), similar binarytrees in random forests
1 at each tree node, split data into two daughter nodes
2 split variable and bound chosen to minimise heterogeneitycriterion
3 stop splitting when enough homogeneity in current branch
4 predicted values at terminal nodes (or leaves) are averageresponse variable y for all observations in final leaf
![Page 52: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/52.jpg)
Illustration
conditional expectation f (x) and well-specified dataset
![Page 53: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/53.jpg)
Illustration
single regression tree
![Page 54: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/54.jpg)
Illustration
ten regression trees obtained by bagging (Bootstrap AGGregatING)
![Page 55: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/55.jpg)
Illustration
average of 100 regression trees
![Page 56: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/56.jpg)
bagging reduces learning variance
When growing forest with many trees,
• grow each tree on an independent bootstrap sample
• at each node, select m variables at random out of all Mpossible variables
• Find the best dichotomous split on the selected m variables
• predictor function estimated by averaging trees
Improve on CART with respect to accuracy and stability
![Page 57: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/57.jpg)
bagging reduces learning variance
When growing forest with many trees,
• grow each tree on an independent bootstrap sample
• at each node, select m variables at random out of all Mpossible variables
• Find the best dichotomous split on the selected m variables
• predictor function estimated by averaging trees
Improve on CART with respect to accuracy and stability
![Page 58: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/58.jpg)
prediction error
A given simulation (y sim, x sim) in the training table is not used inabout 1/3 of the trees (“out-of-bag” case)
Average predictions F oob(x sim) of these trees to give out-of-bagpredictor of y sim
![Page 59: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/59.jpg)
Related methods
• adjusted local linear: Beaumont et al. (2002) Approximate Bayesian
computation in population genetics, Genetics
• ridge regression: Blum et al. (2013) A Comparative Review of
Dimension Reduction Methods in Approximate Bayesian Computation,
Statistical Science
• linear discriminant analysis: Estoup et al. (2012) Estimation of
demo-genetic model probabilities with Approximate Bayesian
Computation using linear discriminant analysis on summary statistics,
Molecular Ecology Resources
• adjusted neural networks: Blum and Francois (2010) Non-linear
regression models for Approximate Bayesian Computation, Statistics and
Computing
![Page 60: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/60.jpg)
ABC parameter estimation (ODOF)
One dimension = one forest (ODOF) methodology
parametric statistical model:
{f (y ; θ) : y ∈ Y, θ ∈ Θ}, Y ⊆ Rn, Θ ⊆ Rp
with intractable density f (·; θ)
plus prior distribution π(θ)
Inference on quantity of interest
ψ(θ) ∈ R
(posterior means, variances, quantiles or covariances)
![Page 61: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/61.jpg)
ABC parameter estimation (ODOF)
One dimension = one forest (ODOF) methodology
parametric statistical model:
{f (y ; θ) : y ∈ Y, θ ∈ Θ}, Y ⊆ Rn, Θ ⊆ Rp
with intractable density f (·; θ)
plus prior distribution π(θ)
Inference on quantity of interest
ψ(θ) ∈ R
(posterior means, variances, quantiles or covariances)
![Page 62: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/62.jpg)
common reference table
Given η : Y → Rk a collection of summary statistics
• produce reference table (RT) used as learning dataset formultiple random forests
• meaning, for 1 ≤ t ≤ N
1 simulate θ(t) ∼ π(θ)2 simulate yt = (y1,t , . . . , yn,t) ∼ f (y ; θ(t))3 compute η(yt) = {η1(yt), . . . , ηk(yt)}
![Page 63: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/63.jpg)
ABC posterior expectations
Recall that θ = (θ1, . . . , θd) ∈ Rd
For each θj , construct a separate RF regression with predictorsvariables equal to summary statistics η(y) = {η1(y), . . . , ηk(y)}
If Lb(η(y∗)) denotes leaf index of b-th tree associated with η(y∗)—leaf reached through path of binary choices in tree—, with |Lb|response variables
E(θj | η(y∗)) =1
B
B∑
b=1
1
|Lb(η(y∗))|∑
t:η(yt)∈Lb(η(y∗))
θ(t)j
is our ABC estimate
![Page 64: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/64.jpg)
ABC posterior expectations
For each θj , construct a separate RF regression with predictorsvariables equal to summary statistics η(y) = {η1(y), . . . , ηk(y)}
If Lb(η(y∗)) denotes leaf index of b-th tree associated with η(y∗)—leaf reached through path of binary choices in tree—, with |Lb|response variables
E(θj | η(y∗)) =1
B
B∑
b=1
1
|Lb(η(y∗))|∑
t:η(yt)∈Lb(η(y∗))
θ(t)j
is our ABC estimate
![Page 65: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/65.jpg)
ABC posterior quantile estimate
Random forests also available for quantile regression[Meinshausen, 2006, JMLR]
Since
E(θj | η(y∗)) =N∑
t=1
wt(η(y∗))θ(t)j
with
wt(η(y∗)) =1
B
B∑
b=1
ILb(η(y∗))(η(yt))
|Lb(η(y∗))|
natural estimate of the cdf of θj is
F (u | η(y∗)) =N∑
t=1
wt(η(y∗))I{θ(t)j ≤u}
.
![Page 66: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/66.jpg)
ABC posterior quantile estimate
Since
E(θj | η(y∗)) =N∑
t=1
wt(η(y∗))θ(t)j
with
wt(η(y∗)) =1
B
B∑
b=1
ILb(η(y∗))(η(yt))
|Lb(η(y∗))|
natural estimate of the cdf of θj is
F (u | η(y∗)) =N∑
t=1
wt(η(y∗))I{θ(t)j ≤u}
.
ABC posterior quantiles + credible intervals given by F−1
![Page 67: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/67.jpg)
ABC variances
Even though approximation of Var(θj | η(y∗)) available based on
F , choice of alternative and slightly more involved version
In a given tree b in a random forest, existence of out-of-baf entries,i.e., not sampled in associated bootstrap subsample
Use of out-of-bag simulations to produce estimate of E{θj | η(yt)},θj
(t),
Apply weights ωt(η(y∗)) to out-of-bag residuals:
Var(θj | η(y∗)) =N∑
t=1
ωt(η(y∗)){
(θ(t)j − θj
(t)}2
![Page 68: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/68.jpg)
ABC variances
Even though approximation of Var(θj | η(y∗)) available based on
F , choice of alternative and slightly more involved version
In a given tree b in a random forest, existence of out-of-baf entries,i.e., not sampled in associated bootstrap subsample
Use of out-of-bag simulations to produce estimate of E{θj | η(yt)},θj
(t),
Apply weights ωt(η(y∗)) to out-of-bag residuals:
Var(θj | η(y∗)) =N∑
t=1
ωt(η(y∗)){
(θ(t)j − θj
(t)}2
![Page 69: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/69.jpg)
ABC covariances
For estimating Cov(θj , θ` | η(y∗)), construction of a specificrandom forest
product of out-of-bag errors for θj and θ`
{θ
(t)j − θj
(t)}{
θ(t)` − θ
(t)`
}
with again predictors variables the summary statisticsη(y) = {η1(y), . . . , ηk(y)}
![Page 70: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/70.jpg)
Gaussian toy example
Take
(y1, . . . , yn) | θ1, θ2 ∼iid N (θ1, θ2), n = 10
θ1 | θ2 ∼ N (0, θ2)
θ2 ∼ IG(4, 3)
θ1 | y ∼ T(n + 8, (ny)/(n + 1), (s2 + 6)/((n + 1)(n + 8))
)
θ2 | y ∼ IG{
n/2 + 4, s2/2 + 3}
Closed-form theoretical values likeψ1(y) = E(θ1 | y), ψ2(y) = E(θ2 | y), ψ3(y) = Var(θ1 | y) andψ4(y) = Var(θ2 | y)
![Page 71: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/71.jpg)
Gaussian toy example
Reference table of N = 10, 000 Gaussian replicatesIndependent Gaussian test set of size Npred = 100
k = 53 summary statistics: the sample mean, the samplevariance and the sample median absolute deviation, and 50independent pure-noise variables (uniform [0,1])
![Page 72: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/72.jpg)
Gaussian toy example
−2 −1 0 1 2
−2
−1
01
2
ψ1
ψ~1
0.5 1.0 1.5 2.0 2.5
0.5
1.0
1.5
2.0
2.5
ψ2
ψ~2
0.05 0.10 0.15 0.20 0.25 0.30 0.35
0.05
0.15
0.25
0.35
ψ3
ψ~3
0.0 0.2 0.4 0.6 0.80.
00.
20.
40.
60.
8
ψ4
ψ~4
Scatterplot of the theoretical values with their correspondingestimates
![Page 73: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/73.jpg)
Gaussian toy example
−4 −3 −2 −1 0 1 2
−4
−3
−2
−1
01
2
Q0.025(θ1 | y)
Q~0.
025(θ
1 | y
)
−1 0 1 2 3 4
−1
01
23
4
Q0.975(θ1 | y)
Q~0.
975(θ
1 | y
)
0.2 0.4 0.6 0.8 1.0 1.2
0.2
0.4
0.6
0.8
1.0
1.2
Q0.025(θ2 | y)
Q~0.
025(θ
2 | y
)
1 2 3 4 51
23
45
Q0.975(θ2 | y)
Q~0.
975(θ
2 | y
)
Scatterplot of the theoretical values of 2.5% and 97.5% posteriorquantiles for θ1 and θ2 with their corresponding estimates
![Page 74: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/74.jpg)
Gaussian toy example
ODOF adj local linear adj ridge adj neural net
ψ1(y) = E(θ1 | y) 0.21 0.42 0.38 0.42ψ2(y) = E(θ2 | y) 0.11 0.20 0.26 0.22ψ3(y) = Var(θ1 | y) 0.47 0.66 0.75 0.48ψ4(y) = Var(θ2 | y) 0.46 0.85 0.73 0.98
Q0.025(θ1|y) 0.69 0.55 0.78 0.53Q0.025(θ2|y) 0.06 0.45 0.68 1.02Q0.975(θ1|y) 0.48 0.55 0.79 0.50Q0.975(θ2|y) 0.18 0.23 0.23 0.38
Comparison of normalized mean absolute errors
![Page 75: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/75.jpg)
Gaussian toy example
True ODOF loc linear ridge Neural net
0.0
0.1
0.2
0.3
0.4
0.5
Var~
(θ1 |
y)
True ODOF loc linear ridge Neural net
0.0
0.2
0.4
0.6
0.8
1.0
Var~
(θ2 |
y)
Boxplot comparison of Var(θ1 | y), Var(θ2 | y) with the truevalues, ODOF and usual ABC methods
![Page 76: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/76.jpg)
Comments
ABC RF methods mostly insensitive both to strong correlationsbetween the summary statistics and to the presence of noisyvariables.
implies less number of simulations and no calibration
Next steps: adaptive schemes, deep learning, inclusion in DIYABC
![Page 77: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/77.jpg)
[some] asymptotics of ABC
1 simulation-based methods inEconometrics
2 Genetics of ABC
3 Approximate Bayesian computation
4 ABC for model choice
5 ABC model choice via random forests
6 ABC estimation via random forests
7 [some] asymptotics of ABC
![Page 78: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/78.jpg)
consistency of ABC posteriors
Asymptotic study of the ABC-posterior z = z(n)
• ABC posterior consistency and convergence rate (in n)
• Asymptotic shape of πε(·|y(n))
• Asymptotic behaviour of θε = EABC[θ|y(n)]
[Frazier et al., 2016]
![Page 79: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/79.jpg)
consistency of ABC posteriors
• Concentration around true value and Bayesian consistency lessstringent conditions on the convergence speed of tolerance εnto zero, when compared with asymptotic normality of ABCposterior
• asymptotic normality of ABC posterior mean does not requireasymptotic normality of ABC posterior
![Page 80: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/80.jpg)
ABC posterior consistency
For a sample y = y(n) and a tolerance ε = εn, when n→ +∞,assuming a parametric model θ ∈ Rk , k fixed
• Concentration of summary η(z): there exists b(θ) such that
η(z)− b(θ) = oPθ(1)
• Consistency:
Πεn (‖θ − θ0‖ ≤ δ|y) = 1 + op(1)
• Convergence rate: there exists δn = o(1) such that
Πεn (‖θ − θ0‖ ≤ δn|y) = 1 + op(1)
![Page 81: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/81.jpg)
Related results
existing studies on the large sample properties of ABC, in whichthe asymptotic properties of point estimators derived from ABChave been the primary focus
[Creel et al., 2015; Jasra, 2015; Li & Fearnhead, 2015]
![Page 82: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/82.jpg)
Convergence when εn & σn
Under assumptions
(A1) ∃σn → +∞
Pθ(σ−1n ‖η(z)− b(θ)‖ > u
)≤ c(θ)h(u), lim
u→+∞h(u) = 0
(A2)Π(‖b(θ)− b(θ0)‖ ≤ u) � uD , u ≈ 0
posterior consistency and posterior concentration rate λT thatdepends on the deviation control of d2{η(z), b(θ)}posterior concentration rate for b(θ) bounded from below by O(εT )
![Page 83: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/83.jpg)
Convergence when εn & σn
Under assumptions
(A1) ∃σn → +∞
Pθ(σ−1n ‖η(z)− b(θ)‖ > u
)≤ c(θ)h(u), lim
u→+∞h(u) = 0
(A2)Π(‖b(θ)− b(θ0)‖ ≤ u) � uD , u ≈ 0
then
Πεn
(‖b(θ)− b(θ0)‖ . εn + σnh−1(εDn )|y
)= 1 + op0(1)
If also ‖θ − θ0‖ ≤ L‖b(θ)− c(θ0)‖α, locally and θ → b(θ) 1-1
Πεn(‖θ − θ0‖ . εαn + σαn (h−1(εDn ))α︸ ︷︷ ︸δn
|y) = 1 + op0(1)
![Page 84: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/84.jpg)
Comments
• if Pθ (σn‖η(z)− b(θ)‖ > u) ≤ c(θ)h(u), two cases
1 Polynomial tail: h(u) . u−κ, then δn = εn + σnε−D/κn
2 Exponential tail: h(u) . e−cu, then δn = εn + σn log(1/εn)
• E.g., η(y) = n−1∑
i g(yi ) with moments on g (case 1) orLaplace transform (case 2)
![Page 85: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/85.jpg)
Comments
• if Pθ (σn‖η(z)− b(θ)‖ > u) ≤ c(θ)h(u), two cases
1 Polynomial tail: h(u) . u−κ, then δn = εn + σnε−D/κn
2 Exponential tail: h(u) . e−cu, then δn = εn + σn log(1/εn)
• E.g., η(y) = n−1∑
i g(yi ) with moments on g (case 1) orLaplace transform (case 2)
![Page 86: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/86.jpg)
Comments
• Π(‖b(θ)− b(θ0)‖ ≤ u) � uD : If Π regular enough thenD = dim(θ)
• no need to approximate the density f (η(y)|θ).
• Same results holds when εn = o(σn) if (A2) replaced with
inf|x |≤M
Pθ(|σ−1
n (η(z)− b(θ))− x | ≤ u)& uD , u ≈ 0
![Page 87: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/87.jpg)
proof
Simple enough proof: assume σn ≤ δεn and
|η(y)− b(θ0)| . σn, ‖η(y)− η(z)‖ ≤ εn
Hence
‖b(θ)− b(θ0)‖ > δn ⇒ |η(z)− b(θ)| > δn − εn − σn := tn
Also, if ‖b(θ)− b(θ0)‖ ≤ εn/3
‖η(y)− η(z)‖ ≤ |η(z)− b(θ)|+ σn︸︷︷︸≤εn/3
+εn/3
and
Πεn (‖b(θ)− b(θ0)‖ > δn|y) ≤
∫‖b(θ)−b(θ0)‖>δn
Pθ (|η(z)− b(θ)| > tn) dΠ(θ)∫|b(θ)−b(θ0)|≤εn/3
Pθ (|η(z)− b(θ)| ≤ εn/3) dΠ(θ)
. ε−Dn h(tnσ
−1n )
∫Θ
c(θ)dΠ(θ)
![Page 88: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/88.jpg)
proof
Simple enough proof: assume σn ≤ δεn and
|η(y)− b(θ0)| . σn, ‖η(y)− η(z)‖ ≤ εn
Hence
‖b(θ)− b(θ0)‖ > δn ⇒ |η(z)− b(θ)| > δn − εn − σn := tn
Also, if ‖b(θ)− b(θ0)‖ ≤ εn/3
‖η(y)− η(z)‖ ≤ |η(z)− b(θ)|+ σn︸︷︷︸≤εn/3
+εn/3
and
Πεn (‖b(θ)− b(θ0)‖ > δn|y) ≤
∫‖b(θ)−b(θ0)‖>δn
Pθ (|η(z)− b(θ)| > tn) dΠ(θ)∫|b(θ)−b(θ0)|≤εn/3
Pθ (|η(z)− b(θ)| ≤ εn/3) dΠ(θ)
. ε−Dn h(tnσ
−1n )
∫Θ
c(θ)dΠ(θ)
![Page 89: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/89.jpg)
Summary statistic and (in)consistency
Consider the moving average MA(2) model
yt = et + θ1et−1 + θ2et−2, et ∼i.i.d. N (0, 1)
and−2 ≤ θ1 ≤ 2, θ1 + θ2 ≥ −1, θ1 − θ2 ≤ 1.
summary statistics equal to sample autocovariances
ηj(y) = T−1T∑
t=1+j
ytyt−j j = 0, 1
with
η0(y)P→ E[y 2
t ] = 1 + (θ01)2 + (θ02)2 and η1(y)P→ E[ytyt−1] = θ01(1 + θ02)
For ABC target pε (θ|η(y)) to be degenerate at θ0
0 = b(θ0)− b (θ) =
(1 + (θ01)2 + (θ02)2
θ01(1 + θ02)
)−(
1 + (θ1)2 + (θ2)2
θ1(1 + θ2)
)must have unique solution θ = θ0
Take θ01 = .6, θ02 = .2: equation has 2 solutions
θ1 = .6, θ2 = .2 and θ1 ≈ .5453, θ2 ≈ .3204
![Page 90: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/90.jpg)
Summary statistic and (in)consistency
Consider the moving average MA(2) model
yt = et + θ1et−1 + θ2et−2, et ∼i.i.d. N (0, 1)
and−2 ≤ θ1 ≤ 2, θ1 + θ2 ≥ −1, θ1 − θ2 ≤ 1.
summary statistics equal to sample autocovariances
ηj(y) = T−1T∑
t=1+j
ytyt−j j = 0, 1
with
η0(y)P→ E[y 2
t ] = 1 + (θ01)2 + (θ02)2 and η1(y)P→ E[ytyt−1] = θ01(1 + θ02)
For ABC target pε (θ|η(y)) to be degenerate at θ0
0 = b(θ0)− b (θ) =
(1 + (θ01)2 + (θ02)2
θ01(1 + θ02)
)−(
1 + (θ1)2 + (θ2)2
θ1(1 + θ2)
)must have unique solution θ = θ0
Take θ01 = .6, θ02 = .2: equation has 2 solutions
θ1 = .6, θ2 = .2 and θ1 ≈ .5453, θ2 ≈ .3204
![Page 91: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/91.jpg)
Asymptotic shape of posterior distribution
Three different regimes:
1 σn = o(εn) −→ Uniform limit
2 σn � εn −→ perturbated Gaussian limit
3 σn � εn −→ Gaussian limit
![Page 92: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/92.jpg)
Assumptions
• (B1) Concentration of summary η: Σn(θ) ∈ Rk1×k1 is o(1)
Σn(θ)−1{η(z)−b(θ)} ⇒ Nk1(0, Id), (Σn(θ)Σn(θ0)−1)n = Co
• (B2) b(θ) is C1 and
‖θ − θ0‖ . ‖b(θ)− b(θ0)‖
• (B3) Dominated convergence and
limn
Pθ(Σn(θ)−1{η(z)− b(θ)} ∈ u + B(0,un))∏j un(j)
→ ϕ(u)
![Page 93: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/93.jpg)
main result
Set Σn(θ) = σnD(θ) for θ ≈ θ0 and Z o = Σn(θ0)−1(η(y)− θ0),then under (B1) and (B2)
• when εnσ−1n → +∞
Πεn [ε−1n (θ − θ0) ∈ A|y]⇒ UB0 (A), B0 = {x ∈ Rk ; ‖b′(θ0)T x‖ ≤ 1}
• when εnσ−1n → c
Πεn [Σn(θ0)−1(θ − θ0)− Z o ∈ A|y]⇒ Qc(A), Qc 6= N
• when εnσ−1n → 0 and (B3) holds, set
Vn = [b′(θ0)]TΣn(θ0)b′(θ0)
thenΠεn [V−1
n (θ − θ0)− Z o ∈ A|y]⇒ Φ(A),
![Page 94: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/94.jpg)
intuition
Set x(θ) = σ−1n (θ − θ0)− Z o (k = 1)
πn := Πεn [ε−1n (θ − θ0) ∈ A|y]
=
∫|θ−θ0|≤un
1lx(θ)∈APθ
(‖σ−1
n (η(z)− b(θ)) + x(θ)‖ ≤ σ−1n εn
)p(θ)dθ∫
|θ−θ0|≤unPθ(‖σ−1
n (η(z)− b(θ)) + x(θ)‖ ≤ σ−1n εn
)p(θ)dθ
+ op(1)
• If εn/σn � 1 :
Pθ(|σ−1
n (η(z)− b(θ)) + x(θ)| ≤ σ−1n εn
)= 1+o(1), iff |x | ≤ σ−1
n εn+o(1)
• If εn/σn = o(1)
Pθ(|σ−1
n (η(z)− b(θ)) + x | ≤ σ−1n εn
)= φ(x)σn(1 + o(1))
![Page 95: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/95.jpg)
more comments
• Surprising : U(−εn, εn) limit when εn � σn
but not so muchsince εn = o(1) means concentration around θ0 andσn = o(εn) implies that b(θ)− b(θ0) ≈ η(z)− η(y)
• again, there is no true need to control approximation off (η(y)|θ) by a Gaussian density: merely a control of thedistribution
• we have
Z o = Z o/
b′(θ0) like the asym score
• generalisation to the case where eigenvalues of Σn aredn,1 6= · · · 6= dn,k
• behaviour of EABC (θ|y) as in Li & Fearnhead (2016)
![Page 96: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/96.jpg)
more comments
• Surprising : U(−εn, εn) limit when εn � σn but not so muchsince εn = o(1) means concentration around θ0 andσn = o(εn) implies that b(θ)− b(θ0) ≈ η(z)− η(y)
• again, there is no true need to control approximation off (η(y)|θ) by a Gaussian density: merely a control of thedistribution
• we have
Z o = Z o/
b′(θ0) like the asym score
• generalisation to the case where eigenvalues of Σn aredn,1 6= · · · 6= dn,k
• behaviour of EABC (θ|y) as in Li & Fearnhead (2016)
![Page 97: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/97.jpg)
even more comments
If (also) p(θ) is Holder β
EABC (θ|y)− θ0 = σnZ o
b(θ0)′︸ ︷︷ ︸score for f (η(y)|θ)
+
bβ/2c∑j=1
ε2jn Hj(θ0, p, b)
︸ ︷︷ ︸bias from threshold approx
+o(σn) + O(εβ+1n )
• if ε2n = o(σn) : Efficiency
EABC (θ|y)− θ0 = σnZ o
b(θ0)′+ o(σn)
• the Hj(θ0, p, b)’s are deterministic
we gain nothing by getting a first crude θ(y) = EABC (θ|y)for some η(y) and then rerun ABC with θ(y)
![Page 98: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/98.jpg)
impact of the dimension of η
dimension of η(.) does not impact above result, but impactsacceptance probability
• if εn = o(σn), k1 = dim(η(y)), k = dim(θ) & k1 ≥ k
αn := Pr (‖y − z‖ ≤ εn) � εk1n σ−k1+kn
• if εn & σnαn := Pr (‖y − z‖ ≤ εn) � εkn
• If we choose αn
• αn = o(σkn ) leads to εn = σn(αnσ
−kn )1/k1 = o(σn)
• αn & σn leads to εn � α1/kn .
![Page 99: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/99.jpg)
conclusion on ABC consistency
• asymptotic description of ABC: different regimes dependingon εn σn
• no point in choosing εn arbitrarily small: just εn = o(σn)
• no gain in iterative ABC
• results under weak conditions by not studying g(η(z)|θ)
the end
![Page 100: ABC short course: final chapters](https://reader034.fdocuments.us/reader034/viewer/2022050613/58a26ddb1a28ab94628b50e9/html5/thumbnails/100.jpg)
conclusion on ABC consistency
• asymptotic description of ABC: different regimes dependingon εn σn
• no point in choosing εn arbitrarily small: just εn = o(σn)
• no gain in iterative ABC
• results under weak conditions by not studying g(η(z)|θ)
the end