Tuning indirect predictions based on SNP effects from...
Transcript of Tuning indirect predictions based on SNP effects from...
Indirect predictions based on SNP effects from ssGBLUP
in large genotyped populations
Daniela Lourenco
A. Legarra, S. Tsuruta, Y. Masuda, T. Lawlor, I. Misztal
ADSA 2018
Why Indirect predictions (IP)?
• Interim evaluations• Between official runs
• Not all genotyped animals are in the evaluations• Animals with incomplete pedigree increase bias and lower R2
• Commercial products• e.g. GeneMax for non-registered animals
2
Indirect predictions in ssGBLUP
X′X X′W
W′X W′W+𝐇−𝟏λ
𝑏 𝑢=
X′y
W′y
H−1=A−1+0 0
0 G−𝟏– A22
−1
𝒂 = λ𝐃 𝐙′G−1 𝒖
GAPY−1
GEBVsSNP effects
3
GEBVyoung = w1PA + w2GP – w3PP
GEBVyoung ≈ GP
Lourenco et al., 2015
COR( GEBV𝑦𝑜𝑢𝑛𝑔, 𝐙 a) > 0.99
GEBVyoung ≈ GP = 𝐙 a
Problems with Indirect predictions
COR( GEBV𝑦𝑜𝑢𝑛𝑔, 𝐙 a) > 0.99
Avg( GEBV) ≈ 100 Avg(𝐙 a) ≈ 0
4
Objectives
1) Fine-tune indirect predictions to be compatible with GEBV
2) Investigate whether SNP effects are accurate when APY is used
• Possibly use subset of core animals
5
Dataset
• US Holstein 2014 type data
• 8.3M animals in pedigree
• No UPG
• 9.2M Udder Depth (UD)
• h2 = 0.33
• 9.2M Foot Angle (FA)
• h2 = 0.11
• 105k genotyped
• Training: 100k
• Validation: 5k
• Complete
• Phenotypes up to 2010
• Genotypes up to 2010 (105k)
• Reduced
• Phenotypes up to 2010
• NO Genotypes for validation (100k)
• SNP effects and IP from Reduced
• Compare with GEBV from Complete
6
Accuracy of SNP effects
𝒂𝐆 = λ𝐃 𝐙′G−1 u
7
G-1
𝒂𝐆𝑨𝑷𝒀−𝟏 𝐑 = λ𝐃 𝐙
′𝐆APY_𝑟𝑎𝑛𝑑𝑜𝑚−1 𝒖APY
𝒂𝐆𝑨𝑷𝒀−𝟏 𝐇 = λ𝐃 𝐙
′𝐆APY_ℎ𝑖𝑔ℎ_𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦−1 𝒖APY
APY G-1
15k core85k noncore
100k
𝒂𝐆𝒄𝒄−𝟏𝐑 = λ𝐃 𝐙′Gcc_𝑟𝑎𝑛𝑑𝑜𝑚−1 𝒖APY
𝒂𝐆𝒄𝒄−𝟏𝐇 = λ𝐃 𝐙′G𝐶𝐶_ℎ𝑖𝑔ℎ_𝑟𝑒𝑙𝑖𝑎𝑏𝑖𝑙𝑖𝑡𝑦−1 𝒖APY
G-1 core
15k
Statistics for SNP effects - 𝐆APY−1
8Udder Depth Foot Angle
G−1
𝐆APY_𝐻𝑖𝑔ℎ−1
𝐆APY_𝑅𝑎𝑛𝑑−1
G−1
𝐆APY_𝐻𝑖𝑔ℎ−1
𝐆APY_𝑅𝑎𝑛𝑑−1
9
Statistics for SNP effects - 𝐆CC−1
Udder Depth
c
Foot Angle
c
G−1
G𝐶𝐶_𝐻𝑖𝑔ℎ−1
G𝐶𝐶_𝑅𝑎𝑛𝑑−1
G−1
G𝐶𝐶_𝐻𝑖𝑔ℎ−1
G𝐶𝐶_𝑅𝑎𝑛𝑑−1
Statistics for 𝐙 a - 𝐆CC−1
10Udder Depth
c
Foot Angle
c
G−1
G𝐶𝐶_𝐻𝑖𝑔ℎ−1
G𝐶𝐶_𝑅𝑎𝑛𝑑−1
G−1
G𝐶𝐶_𝐻𝑖𝑔ℎ−1
G𝐶𝐶_𝑅𝑎𝑛𝑑−1
Understanding genetic and genomic bases
• Base of BLUP: founders of the pedigree
• Base of GBLUP: genotyped animals
• Base of SSGBLUP: Vitezica et al. (2011) modeled as a mean for genotyped
• 𝑝 𝒖𝑔 = 𝑁 𝟏𝜇, 𝐆
• 𝜇 = (Pedigree base) – (Genomic base)
Fine-tuning indirect predictions from ssGBLUP
11
Fine-tuning indirect predictions from ssGBLUP
𝒖𝑖𝑝= 𝜇 + 0.95𝐙 a + 0.05 𝒖𝑝𝑎𝑟𝑒𝑛𝑡𝑠1) Formula in Legarra (2017)
2) Double fitting
a) fit a regression using genotyped animals in the evaluation
GEBV𝑒𝑣𝑎𝑙 = 𝑏0 + 𝑏1𝐙 𝑎
b) apply regression for indirectly predicted animals
𝒖𝑖𝑝= 𝑏0 + 𝑏1𝐙 𝑎
3) Add average GEBV 𝒖𝑖𝑝= 𝐺𝐸𝐵𝑉𝑒𝑣𝑎𝑙 + 𝐙 𝑎
SNP and pedigree fractions
SNP and pedigree fractions
12
Bias of indirect predictions
13
-1
0
1
2
3
4
5
6
GEBV − uip
Foot AngleUdder Depth
-1
0
1
2
3
4
5
6
GEBV − uip
14
0.980.98
0.98 0.980.980.98
0.98 0.98
Za Legarra_2017 Double_Fitting Average_GEBV
Co
rrel
atio
n(G
EBV,
uip
)
FA UD
Correlation between GEBV and indirect predictions
Fine-tuning indirect predictions in ssGBLUP
E( 𝒖| 𝒂) = 𝜇 + 𝐙𝟏
𝟐 𝒑(𝟏 − 𝒑)𝐈𝟏
𝟐 𝒑(𝟏 − 𝒑)
−𝟏
( 𝒂 − 0)
E( 𝒖| 𝒂) = 𝜇 + 𝐙 𝒂
E( 𝒖| 𝒂) = 𝐺𝐸𝐵𝑉 + 𝐙 𝒂
≈
15
Final Remarks
• SNP effects can be calculated based on APY G-1 or core G-1
• Slightly less accurate with core
• Indirect predictions based on core animals are accurate
• Reduction in computing time compared to G-1
• Similar computing time as APY G-1
• Indirect predictions are unbiased after corrections
• Average GEBV, Legarra (2017) or double fitting
• Can be used as interim evaluation16
Acknowledgements
17