Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction...
Transcript of Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction...
![Page 1: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/1.jpg)
Fully powered polygenic prediction
using summary statistics
Alkes L. Price
Harvard T.H. Chan School of Public Health
October 7, 2015 To download slides of this talk: google “Alkes HSPH”
![Page 2: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/2.jpg)
Summary statistics are widely available
—Nat Genet editorial, July 2012
![Page 3: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/3.jpg)
Outline
1. A brief history of summary statistic genetics
2. Introduction to polygenic prediction using summary statistics
3. LDpred method for polygenic prediction using summary statistics
4. Application of LDpred to real data sets
![Page 4: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/4.jpg)
Outline
1. A brief history of summary statistic genetics
2. Introduction to polygenic prediction using summary statistics
3. LDpred method for polygenic prediction using summary statistics
4. Application of LDpred to real data sets
![Page 5: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/5.jpg)
Definition of summary statistics
Definition: Summary statistics consist of:
• GWAS association z-scores for each typed or imputed SNP +
• Sample sizes on which z-scores were computed (may vary by SNP)
Note: Many applications also require LD information computed from
a reference panel (e.g. 1000 Genomes or UK10K) using a population
“very similar” to the target sample.
![Page 6: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/6.jpg)
Meta-analysis can be performed
using summary statistics
Evangelou & Ioannidis 2013 Nat Rev Genet
![Page 7: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/7.jpg)
Joint and conditional analysis can be
performed using summary statistics
Yang et al. 2012 Nat Genet
![Page 8: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/8.jpg)
Imputation can be performed
using summary statistics
Lee et al. 2013 Bioinformatics; Pasaniuc et al. 2014 Bioinformatics
also see Park et al. 2015 Bioinformatics, Lee et al. 2015 Bioinformatics
![Page 9: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/9.jpg)
Rare variant meta-analysis can be
performed using summary statistics
Lee et al. 2013 AJHG; Hu et al. 2013 AJHG; Liu et al. 2014 Nat Genet
also see Clarke et al. 2013 PLoS Genet, Tang & Lin 2015 AJHG
![Page 10: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/10.jpg)
Genetic variance and covariance can be
inferred using summary statistics
Palla & Dudbridge 2015 AJHG; Bulik-Sullivan et al. 2015 Nat Genet
![Page 11: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/11.jpg)
Functional enrichment can be
inferred using summary statistics
Pickrell 2014 AJHG; Kichaev & Pasaniuc 2015 AJHG; Finucane et al. 2015 Nat Genet
![Page 12: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/12.jpg)
Many projects at ASHG 2015
using summary statistics
• Invited talks Pickrell, Pasaniuc, Im (this session)
• Platform talks 11 Gusev, 77 Cichonska, 220 Golan, 272 Park
• Posters 791 Kichaev, 797 Shi, 807 Roytman, 860 Salem,
868 Pare, 1301 Wu, 1334 Zhu, 1357 Chatterjee, 1477 Brown,
1618 Li, 1668 Khawaja, 1686 Lee, 1687 Zhao, 1728 Torres,
1867 O’Connor
![Page 13: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/13.jpg)
Outline
1. A brief history of summary statistic genetics
2. Introduction to polygenic prediction using summary statistics
3. LDpred method for polygenic prediction using summary statistics
4. Application of LDpred to real data sets
![Page 14: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/14.jpg)
Erbe et al. 2012 J Dairy Sci; Goss et al. 2011 New Engl J Med
Genetic prediction: why care?
![Page 15: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/15.jpg)
Using only genome-wide significant SNPs
is a Stone Age genetic prediction method
i
ikik x ˆˆ
(published SNPs)
φk = phenotype for sample k
βi = effect size for SNP i
xik = genotype for SNP i, sample k
How should we conduct
genetic prediction, Fred?
Prediction r2 is less than half the r2 attained by polygenic prediction
PGC-SCZ 2014 Nature; Vilhjalmsson et al. 2015 AJHG
![Page 16: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/16.jpg)
Polygenic prediction can be performed
using genome-wide summary statistics
i
ikik x ˆˆ
(all GWAS SNPs)
φk = phenotype for sample k
βi = effect size for SNP i
xik = genotype for SNP i, sample k
![Page 17: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/17.jpg)
Is polygenic prediction using raw genotypes
more accurate than using summary statistics?
Answer: slightly.
NMh
hhr
g
gg
/2
22
2
NMrh
hhr
g
gg
/)1( 22
22
2
<
= heritability explained by SNPs
M = number of (unlinked) SNPs
N = number of training samples
2
gh
using summary statistics: using raw genotypes:
fit each SNP individually fit all SNPs simultaneously
(BLUP prediction; Henderson 1975 Biometrics)
Daetwyler et al. 2008 PLoS ONE; Wray et al. 2013 Nat Rev Genet
also see Speed & Balding 2014 Genome Res (multiBLUP)
![Page 18: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/18.jpg)
Accounting for non-infinitesimal architectures
can improve polygenic prediction
Infinitesimal (Gaussian) architecture:
=>
Uniform shrink on estimated effect sizes is appropriate
MhN gi /,0~ 2
NNii /1,0~ˆ
i
i
g
g
iiNMh
hE ˆ
/)ˆ|(
2
2
![Page 19: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/19.jpg)
Accounting for non-infinitesimal architectures
can improve polygenic prediction Non-infinitesimal architecture:
(e.g. point-normal mixture, mixture of normals, etc.)
Non-uniform shrink on estimated effect sizes is appropriate
i
![Page 20: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/20.jpg)
Accounting for non-infinitesimal architectures
can improve polygenic prediction
Infinitesimal (Gaussian) architecture:
=>
Uniform shrink on estimated effect sizes is appropriate
Non-infinitesimal architecture:
(e.g. point-normal mixture, mixture of normals, etc.)
Non-uniform shrink on estimated effect sizes is appropriate
Standard heuristic approach: P-value thresholding
MhN gi /,0~ 2
NNii /1,0~ˆ
i
i
i
ikik x ˆˆ
P-value < PT
(Note: requires optimization of
PT threshold in validation samples)
Purcell et al. 2009 Nature; Chatterjee et al. 2013 Nat Genet; Dudbridge 2013 PLoS Genet
i
g
g
iiNMh
hE ˆ
/)ˆ|(
2
2
![Page 21: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/21.jpg)
Accounting for linkage disequilibrium
can improve polygenic prediction
Problem: does not account for LD b/t SNPs
Standard heuristic approaches:
Random LD-pruning: prune SNPs (e.g. r2 < 0.2),
removing one of each pair of linked SNPs
(decide randomly which SNP to remove)
Informed LD-pruning (LD-clumping): prune SNPs,
removing one of each pair of linked SNPs
(remove SNP with less significant P-value in training data)
i
ikik x ˆˆ
P-value < PT
Purcell et al. 2009 Nature; Stahl et al. 2012 Nat Genet
also see Rietveld et al. 2013 Science (COJO)
![Page 22: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/22.jpg)
Pruning + Thresholding is widely used …
Purcell et al. 2009 Nature; Lango Allen et al. 2010 Nature; Ripke et al. 2011 Nat Genet;
Stahl et al. 2012 Nat Genet; Deloukas et al. 2013 Nat Genet; Ripke et al. 2013 Nat Genet;
Chatterjee et al. 2013 Nat Genet; Dudbridge 2013 PLoS Genet; PGC-SCZ 2014 Nature
![Page 23: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/23.jpg)
Pruning + Thresholding is widely used, but
does not attain maximum prediction accuracy
Vilhjalmsson et al. 2015 AJHG
2
gh
Simulations at different proportions p of causal SNPs:
Infinitesimal Infinitesimal
Non-infinitesimal Non-infinitesimal
![Page 24: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/24.jpg)
Outline
1. A brief history of summary statistic genetics
2. Introduction to polygenic prediction using summary statistics
3. LDpred method for polygenic prediction using summary statistics
4. Application of LDpred to real data sets
![Page 25: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/25.jpg)
LDpred computes posterior means under a
point-normal prior, accounting for LD
Vilhjalmsson et al. 2015 AJHG
i
ikiik xE )ˆ|(ˆ
(all GWAS SNPs)
φk = phenotype for sample k
βi = effect size for SNP i
xik = genotype for SNP i, sample k
where are posterior mean effect sizes )ˆ|( iiE
![Page 26: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/26.jpg)
LDpred computes posterior means under a
point-normal prior, accounting for LD
Vilhjalmsson et al. 2015 AJHG
i
ikiik xE )ˆ|(ˆ
(all GWAS SNPs)
φk = phenotype for sample k
βi = effect size for SNP i
xik = genotype for SNP i, sample k
where are posterior mean effect sizes based on
• point-normal prior with 2 parameters:
= heritability explained by SNPs (estimated from training data)
p = proportion of causal SNPs (optimized in validation samples)
• LD from a reference panel
Use validation samples as LD reference
(restrict to SNPs with validation data)
)ˆ|( iiE
2
gh
![Page 27: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/27.jpg)
In the special case of no LD between SNPs,
posterior means can be computed analytically
= heritability explained by SNPs
p = proportion of causal SNPs
M = number of (unlinked) SNPs
N = number of training samples
2
gh
ii
g
g
ii pNMph
hE ˆ
/)ˆ|(
2
2
where
is the posterior probability that , i.e. SNP i is causal
(generalizes uniform shrink when p = 1: infinitesimal prior, no LD)
)/1(2
ˆ
)/1/(2
ˆ
2
)/1/(2
ˆ
2
2
2
2
2
2
/1
1
/1/
/1/
NNMph
g
NMph
g
ii
g
i
g
i
eN
pe
NMph
p
eNMph
p
p
0i
![Page 28: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/28.jpg)
In the special case of infinitesimal prior (with LD),
posterior means can be computed analytically
= heritability explained by SNPs
M = number of (unlinked) SNPs
N = number of training samples
2
gh
i
g
ii INh
MDE ˆ)ˆ|(
1
2
where D is an LD matrix from a reference panel
(generalizes uniform shrink when D = I: infinitesimal prior, no LD)
![Page 29: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/29.jpg)
General case of non-infinitesimal prior with LD:
posterior means cannot be computed analytically
![Page 30: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/30.jpg)
General case of non-infinitesimal prior with LD:
posterior means cannot be computed analytically
Possible solutions:
• Assume 1 causal variant per locus
![Page 31: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/31.jpg)
General case of non-infinitesimal prior with LD:
posterior means cannot be computed analytically
Possible solutions:
• Assume 1 causal variant per locus
• Iterative approach
![Page 32: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/32.jpg)
General case of non-infinitesimal prior with LD:
posterior means cannot be computed analytically
Possible solutions:
• Assume 1 causal variant per locus
• Iterative approach
• MCMC
![Page 33: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/33.jpg)
General case of non-infinitesimal prior with LD:
posterior means cannot be computed analytically
Solution: use MCMC.
Initialize = 0
At each big iteration
For each SNP i
Re-sample based on
• Point-normal prior on
• Observed
, where
reflects point-normal prior (based on and p)
i)/,(~ˆ NDDN
i
i
)( if 2
gh
)ˆ(ˆ2
1
)(~)ˆ|(
DDD
N
ii
T
eff
![Page 34: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/34.jpg)
General case of non-infinitesimal prior with LD:
posterior means cannot be computed analytically
Solution: use MCMC.
Initialize = 0
At each big iteration
For each SNP i
Re-sample based on
• Point-normal prior on
• Observed
100 big iterations generally suffice for convergence
Rao-Blackwellization: average the posterior means sampled
i)/,(~ˆ NDDN
Related MCMC methods for prediction from raw genotypes are described in
Erbe et al. 2012 J Dairy Sci, Zhou et al. 2013 PLoS Genet, Moser et al. 2015 PLoS Genet
i
i
![Page 35: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/35.jpg)
LDpred performs well in simulations
Simulations with real genotypes, 1% of SNPs causal
![Page 36: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/36.jpg)
Understanding polygenic prediction
Let’s hide away and dance.
-- Freddie K.
Let’s hide away with data.
-- Alkes
![Page 37: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/37.jpg)
Outline
1. A brief history of summary statistic genetics
2. Introduction to polygenic prediction using summary statistics
3. LDpred method for polygenic prediction using summary statistics
4. Application of LDpred to real data sets
![Page 38: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/38.jpg)
LDpred performs well on within-cohort
prediction of WTCCC traits …
Data from WTCCC 2007 Nature. Results are similar to MCMC-based methods that
require raw genotypes: Zhou et al. 2013 PLoS Genet, Moser et al. 2015 PLoS Genet
![Page 39: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/39.jpg)
LDpred performs well on within-cohort
prediction of WTCCC traits …
(see Lee et al. 2012 Genet Epidemiol) 222
liabobsnag RRR
Data from WTCCC 2007 Nature. Results are similar to MCMC-based methods that
require raw genotypes: Zhou et al. 2013 PLoS Genet, Moser et al. 2015 PLoS Genet
![Page 40: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/40.jpg)
LDpred performs well on within-cohort
prediction of WTCCC traits …
Dominated
by HLA
Data from WTCCC 2007 Nature. Results are similar to MCMC-based methods that
require raw genotypes: Zhou et al. 2013 PLoS Genet, Moser et al. 2015 PLoS Genet
![Page 41: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/41.jpg)
LDpred performs well on within-cohort
prediction of WTCCC traits …
Do not
validate in
new cohort
Data from WTCCC 2007 Nature. Results are similar to MCMC-based methods that
require raw genotypes: Zhou et al. 2013 PLoS Genet, Moser et al. 2015 PLoS Genet
![Page 42: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/42.jpg)
… but within-cohort prediction accuracy
may be too good to be true
2
nagR
Results presented for LDpred; similar relative results for other methods
Cryptic relatedness? Population structure? (Wray et al. 2013 Nat Rev Genet)
CAD T2D
Training: WTCCC
Validation: WTCCC
0.0451 0.0467
Training: WTCCC
Validation: WGHS
0.0048 0.0095
![Page 43: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/43.jpg)
LDpred performs well on summary statistics
with independent validation cohorts
Training N=70K PGC-SCZ 2014 Nature; MGS replication sample
![Page 44: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/44.jpg)
LDpred performs well on summary statistics
with independent validation cohorts
Training N=70K Training N=30K Training N=60K
![Page 45: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/45.jpg)
LDpred performs well on summary statistics
with independent validation cohorts
Training N=70K Training N=30K Training N=60K
Training N=70K Training N=90K
![Page 46: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/46.jpg)
LDpred performs well on summary statistics
with independent validation cohorts
Height: complexities due to population stratification.
Including PCs can improve prediction accuracy.
(Chen et al. 2015 Genet Epidemiol)
Training N=130K (Lango Allen et al. 2010 Nature)
![Page 47: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/47.jpg)
• Explicitly modeling both LD and non-infinitesimal architectures
improves polygenic prediction from summary statistics.
• Polygenic prediction should be evaluated using independent
validation cohorts.
• Although polygenic predictions are not yet clinically useful,
prediction accuracies will increase as sample sizes increase
(bounded by heritability explained by SNPs; ).
Conclusions …
2
gh
![Page 48: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/48.jpg)
• Polygenic prediction in non-European samples is challenging.
How to combine training data from Europeans (large sample size)
with training data from target population (small sample size)?
(cross-population genetic correlation; Poster 1477 Brown)
• Enrichment of heritability in functional annotation classes
could potentially be used to improve polygenic prediction
(Poster 1357 Chatterjee)
• Methods for large raw genotype data sets (e.g. UK Biobank)
should be developed in parallel with summary statistic methods
(Platform talk 38 Loh; Platform talk 170 Young)
… and Future directions
![Page 49: Fully powered polygenic prediction using summary statistics · Fully powered polygenic prediction using summary statistics Alkes L. Price Harvard T.H. Chan School of Public Health](https://reader030.fdocuments.us/reader030/viewer/2022021723/5c8a765909d3f2d5658bce87/html5/thumbnails/49.jpg)
Acknowledgements
Bjarni Vilhjalmsson + Vilhjalmsson et al. 2015 AJHG co-authors
Everyone in alkesgrp. Please check out our other ASHG 2015 talks:
• Platform talk 11 Gusev “Large-scale transcriptome-wide association study …”
• Platform talk 38 Loh “Contrasting regional architectures of schizophrenia …”
• Platform talk 196 Bhatia “Haplotypes of common SNPs explain a large …”
• Platform talk 352 Galinsky “Population differentiation analysis of 54,734 …”
• Platform talk 346 Hayeck “Mixed model association with family-biased …”
• Platform talk 354 Palamara “Leveraging distant relatedness to quantify …”