Post on 20-May-2020
Extensionofsingle-stepGBLUPtomanygenotypedindividuals
IgnacyMisztalUniversityofGeorgia
Genomicselectionandsingle-step
• Simplicity– NoDYDorDP– Noindex– Nocomplexity
• Accuracy– Avoidsdoublecounting– Avoidsfixedindex– Accountsforpreselection bias
H-1 = A-1 +0 00 G-1 −A22
-1
⎡
⎣⎢⎢
⎤
⎦⎥⎥Aguilaretal.,2010
ChristensenandLund,2010
CurrentimplementationofSS
• GandA22 createdexplicitly• Quadraticmemoryandcubiccomputations• Costper100kgenotypes- 1.5hr (Aguilaretal.,2014)
é ùê úê ú-ë û
-1 -1
-1 -122
H = A +0 00 G A
Numberofgenotypesandimpendingproblem
>2MforHolsteins>400kforAngus
Genomicpre-selectionissue(Patry andDucrocq,2011;VanRaden etal.,2013)
– BLUPincreasinglybiased– Needalldataonpreselectionincluded
Unsymmetric equations
Misztaletal.,2009
NoconvergencewithoutgoodpreconditionerNoconvergencewithlargeHorA
NoGorA22 inversemodel
Legarra andDucrocq (2011)
SlowconvergencewithfewgenotypesDivergencewithmanygenotypes
SNPmodelforgenotypedanimals
Legarra andDucrocq,2011
Nosuccessfulprogramming
SNPmodelforgenotypedanimals
Liuetal,2014
SNPeffectsforallanimals(Fernandoetal.,2014)
centeredgenotypes
imputedgenotypes
CostofimputationRequiresnewtypeofprogrammingExtensiontocomplexmodelsunclear
CanregularssGBLUPbemademoreefficient?
ScalingupA22-1
𝐀𝟐𝟐#𝟏 = 𝐀𝟐𝟐 − 𝐀𝟐𝟏 𝐀𝟐𝟐#𝟏𝐀𝟏𝟐
• 𝐀𝟐𝟐#𝟏 dense(Fauxetal.,2014)
• ForPCGiteration(Stranden etal.,2014)
𝐀𝟐𝟐#𝟏𝐪 = 𝐀𝟐𝟐𝐪 − 𝐀𝟐𝟏 𝐀𝟐𝟐#𝟏
𝐀𝟏𝟐𝐪
• Secondsfor500kanimalswithgoodprogramming(Masudaetal.,2017)
Isdimensionalityofgenomicinformationlimited?
• RegularGnotpositivedefinitepast~5k– BlendingwithA(VanRaden,2008)
• DimensionalityofSNPBLUPsmall(Maciotta etal.,2013)
• Successofimputation
• Manhattanplotsnoisyuntilaveragedby300k-10Mb(dependingonspecies)
…………
Heterogeneticandhomogenic tractsingenome(Stam,1980)
E(#tracts)=4NeL(Stam,1980)Ne– effectivepopulationsizeL–lengthofgenomeinMorgans
Holsteins:Ne≈100L=30Me=12,000
InversionviaSVD/eigenvaluedecompositionAssume1millionanimalsgenotypedwith60kchip
𝐆 = 𝐙𝐙= = 𝐔𝐃𝐔= Eigenvalue decomposition (1M x 1M)
𝐆# = 𝐔𝐃#𝐔= Generalized inverse (1M x 1M)
Z = 𝐔𝐒𝐕 = 𝐔𝐃B.C𝐕 - SVD decomposition (1M x 60k)10h for 720k animals (Masuda, 2017)
t - index for non-negligible eigenvalues, say 10k𝐆# = 𝐔D 𝐃D#E𝐔D= = 𝐔D 𝐒F#E𝐒F#E𝐔D= = 𝐔∗ 𝐔∗
For PCG iteration𝐆#E𝐪 = 𝐔∗ 𝐔∗ 𝐪 - only 1 M x 10k elements
InversebyWoodburyformula
𝐆 = 𝐙𝐙′ + 𝐈𝛆, Woodbury formula𝐆#𝟏 = 𝟏
𝛆𝐈 − 𝟏
𝛆𝐙(𝟏
𝛆𝐙′𝐙 + 𝐈)#𝟏𝐙′ 𝟏
𝛆Z’Z 60k x 60k
For PCG iteration:
𝐆#𝟏𝐪 =𝟏𝛆 𝐈 − 𝐙(𝐔𝐃𝐔′)#𝟏𝐙′ 𝒒 =
𝟏𝛆 𝐈 − 𝐒𝐒′ 𝒒
𝐒 = 𝐙𝐔′𝐃#𝟏/𝟐
With reduced rank 𝐒 = 𝐙𝐔𝒕= 𝐃𝒕#𝟏𝟐 (1M x 10k)
Ostersen etal.,2017
Mantysaari etal.,2017
IfGhaslimiteddimensionality,canG-1
besparselikeA-1?
UseofalaHenderson’srules?
UseofrelativesforG-1
AccuraciesnotgoodenoughTheorynotclear
Assumptionoflimiteddimensionality
= +u Ts eBreedingvalue
s – nx1vectorcontainingadditiveinformationofpopulation(haplotypes,chromosomesegments,LDblocks)?
1 c c-»s T uIfuc containsnanimals:
Verysmallerror
Breedingvaluesofanynanimalscontainsalladditiveinformation
é ùé ù é ù= ê úê ú ê ú
ë û ë ûë ûc c
ncn n
I 0u uP Iu ε
é ùé ù é ù= ê úê ú ê ú
ë ûë û ë û0cncc
nc nn
I 0 I PG 0G
P I 0 IM
--
-
- é ùé ù é ù= ê úê ú ê ú-ë ûë û ë û
11
1cccn
ncnn
I 0G 0I PG
P I0 M0 I
Choosecore“c”andnoncore“n”animals
e= +n nc c nu P u=c cu u
=var( )n nnε M
HowtoestimateP andinv(G)?2var c cc cnu
n nc nn
sæ öé ù é ù
=ç ÷ê ú ê úë û ë ûè ø
u G Gu G G
cc cc cnnc cc
é ù é ùé ùê ú ê ú ë û
ë û ë û
-1 -1-1 -1 -1G 0 G GG = + M G 'G I
0 0 I
1 1| ,n c nc cc c nc cc- -= =u u G G u P G G
Gis“true”relationshipmatrix
APYalgorithm(AlgorithmforProvenandYoung)
PropertiesofAPYalgorithm
Cost:Almostlinearmemoryandcomputations
è
G G-1
G
è
APYG-1
Cost:Quadraticmemoryandcubiccomputations
EAAPmeeting2016
Reliabilities– Holsteins(77k)
Finalscore
regularG-1
4.5k 8k14k 19k77kNeL 2NeL 4NeL
Pocrnic etal.,2016b
Distributionofsegments/haplotypes/..
Assumeddimensionality
Reliability
100%
≈4NeL
Costswith720kgenotypedanimals
• 30MHolsteins• 50Mrecords• 764k60kgenotypes
Item BLUP ssGBLUPAPY G - 7hA22-1 - 10minrounds 402 464Time/round 51s 83sTotaltime 6h 17h
Masudaetal.,2017
Which core animals in APY?Bradfordetal.(2017)
§ Simulatedpopulations(QMSim;Sargolzaei andSchenkel,2009)
§ Ne=40§ #genotypedanimals=50,000
§ Coreanimals:§ Randomgen6||gen7||gen8||gen9||gen
10(y)§ Randomallgenerations
27
Which core animals in APY?
00.10.20.30.40.50.60.70.80.91
98% 95% 90%
Correlation(GEB
V,TBV
)
PercentofvariationexplainedinG
Accuracy
Gen6 Gen7 Gen8 Gen9 Gen10 Random
G-1
28
Bradfordetal.(2016)
4NeL NeL
Persistenceovergenerations
Generations
Reliability
1.0
BLUP
GBLUP– smallpopulation
BayesB – smallpopulation
GBLUP– verylargepopulation
GBLUP– verylargepopulation90%genome
Verylarge– equivalentto4NeLanimalswith99%accuracyAreSNPeffectsfromHolsteinnationalpopulationsconverging
TheoryoflimiteddimensionalityNumberofhaplotypes:4NeLNewithineach¼Morgansegment
¼Morgan
QTLsGenomehaplotypes
Dimensionalityof¼Morgancase:Ne
Fragomeni etal.,2018
ornumberofidentifiedQTLsèReduceddimensionalitywithweightedGRM
ssGBLUP accuraciesusingSNP60Kand100QTNs– simulationstudy
0 10 20 30 40 50 60 70 80 90 100
BLUP
ssGBLUP- unweightedSNP60k
unweightedSNP60k+100QTN
SNP60k+100QTNweightedbyGWAS
SNP60K+100QTNwith"true"variance
plusbyAPY
only100QTNunweightedbyAPY
Fragomeni etal.(2017)
Rank
19k
5k
98
Multitrait ssGBLUP orSNPselection?
• SNPselection/weighting(BayesB,etc.)– Largeimpactwithfewgenotypes– Littleornoimpactwithmany
Variancecomponents
• BasedonSNP– limitations
• REMLbasedonrelationships– Equationsnolongersparse– YAMSsparsematrixpackage–upto100timesspeedup(Masudaetal.,2017)
– APYforREML• MethodR(Legarra andReverter,2017)
Extratopics
• Matchingpedigreesandgenomicrelationships• Missingpedigrees• Crossbreeding• CausativeSNP
• Haplotypesforcrossbreds(Christensenetal.,2016)
• Metafounders (Legarra etal.,2016)• Approximationofreliabilities
Conclusions
• Limiteddimensionalityofgenomicinformationduetolimitedeffectivepopulationsize
• ssGBLUP suitableforanydatasetandmodel
• WithlargedatasetsforHolsteins:– Goodpersistenceofpredictions– Convergenceofpredictionsfromdifferentcountries
Acknowledgements
YutakaMasuda
ShogoTsuruta
DanielaLourenco
BrenoFragomeni
IgnacioAguilar
AndresLegarra
IvanPocrnic
HeatherBradford
TomLawlor,HolsteinAssocPaulVanRaden,AGILUSDA
TheoryforAPY
• Breedingvaluesofcoreanimalslinearfunctionsof:– Independentchromosomesegments(Me)– IndependenteffectiveSNP
• E(Me)=4NeL(Stam,1980;VanRaden,2008)Ne–effectivepopulationsizeL– lengthofgenomeinMorgans
Me=4(Ne=100)(L=30)=12,000
QTL
Accuracy and distance from markers to QTL
38
Fragomenietal.(2017)