Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92...
Transcript of Genetic Variance Partitioning and Genome-Wide Prediction ...by G0123 ≡ 50123 −A7 + 90 +91 +92...
1
Geneticvariancepartitioningandgenome-widepredictionwithallele
dosageinformationinautotetraploidpotato
JeffreyB.Endelman*,CariA.SchmitzCarley*,PaulC.Bethke*†,JosephJ.Coombs‡,MarkE.
Clough§,WashingtonL.daSilva**,WalterS.DeJong**,DavidS.Douches‡,CurtisM.
Frederick*,KathleenG.Haynes††,DavidG.Holm‡‡,J.CreightonMillerJr.§§,PatricioR.
Muñoz***,FelixM.Navarro*,RichardG.Novy†††,JiwanP.Palta*,GregoryA.Porter‡‡‡,KyleT.
Rak*,VidyasagarR.Sathuvalli§§§,AsuntaL.Thompson****,G.CraigYencho§
*DepartmentofHorticulture,UniversityofWisconsin,Madison,WI53706,USA
†USDA-ARSVegetableCropsResearchUnit,Madison,WI53706,USA
‡DepartmentofPlant,SoilandMicrobialSciences,MichiganStateUniversity,EastLansing,
MI48824,USA
§DepartmentofHorticulturalScience,NorthCarolinaStateUniversity,Raleigh,NC27695,
USA
**SchoolofIntegrativePlantScience,CornellUniversity,Ithaca,NY14853,USA
††USDA-ARSGeneticImprovementofFruitsandVegetablesLaboratory,Beltsville,MD
20705,USA
‡‡SanLuisValleyResearchCenter,DepartmentofHorticultureandLandscapeArchitecture,
ColoradoStateUniversity,Center,CO81125,USA
§§DepartmentofHorticulturalSciences,TexasA&MUniversity,CollegeStation,TX77843,
USA
***HorticultureSciencesDepartment,UniversityofFlorida,Gainsville,FL32611,USA
†††USDA–ARSSmallGrainsandPotatoGermplasmResearchUnit,Aberdeen,ID83210,USA
‡‡‡SchoolofFoodandAgriculture,UniversityofMaine,Orono,ME04469,USA
§§§DepartmentofCropandSoilScience,OregonStateUniversity,Hermiston,OR97838,USA
****DepartmentofPlantSciences,NorthDakotaStateUniversity,Fargo,ND58108,USA
Genetics: Early Online, published on March 8, 2018 as 10.1534/genetics.118.300685
Copyright 2018.
2
Runningtitle:Averageeffectsintetraploidpotato
Keywords:tetraploid,non-additiveeffects,genome-wideprediction,potato
Correspondingauthor:
JeffreyB.Endelman
UniversityofWisconsin-Madison
1575LindenDr
Madison,WI53706
Phone:608-250-0754
Email:[email protected]
3
ABSTRACT
Asoneoftheworld’smostimportantfoodcrops,potato(SolanumtuberosumL.)has
spurredinnovationinautotetraploidgenetics,includingtheuseofSNParraystodetermine
alleledosageatthousandsofmarkers.Bycombininggenotypeandpedigreeinformation
withphenotypedataforeconomicallyimportanttraits,theobjectivesofthisstudywereto
(1)partitionthegeneticvarianceintoadditivevs.non-additivecomponents,and(2)
determinetheaccuracyofgenome-wideprediction.Between2012and2017,atraining
populationof571cloneswasevaluatedfortotalyield,specificgravity,andchipfrycolor.
Genomiccovariancematricesforadditive(G),digenicdominant(D),andadditivex
additiveepistatic(G#G)effectswerecalculatedusing3895markers,andthenumerator
relationshipmatrix(A)wascalculatedfroma13-generationpedigree.Basedonmodelfit
andpredictionaccuracy,mixedmodelanalysiswithGwassuperiortoAforyieldandfry
colorbutnotspecificgravity.Theamountofadditivegeneticvariancecapturedbymarkers
was20%ofthetotalgeneticvarianceforspecificgravity,comparedto45%foryieldand
frycolor.Withinthetrainingpopulation,includingnon-additiveeffectsimprovedaccuracy
and/orbiaswhenpredictingtotalgenotypicvalue,forallthreetraits.WhensixF1
populationswereusedforvalidation,predictionaccuracyrangedfrom0.06to0.63and
wasconsistentlylower(0.13onaverage)withoutalleledosageinformation.Weconclude
thatgenome-widepredictionisfeasibleinpotatoandwillimproveselectionforbreeding
valuegiventhesubstantialamountofnon-additivegeneticvarianceinelitegermplasm.
4
INTRODUCTION
Cultivatedpotato(SolanumtuberosumL.)isuniqueamongthemajor,globalfood
cropsinthatitisautotetraploidandclonallypropagated.Asof2018thereare12public
breedingprogramsintheUSwithamandatetoreleasevarietiesforcommercial
production,aswellasseveraladditionalprogramswithafocusongermplasm
enhancement.Thevarietydevelopmentprocessbeginswithbotanicalseedfromthesexual
hybridizationofheterozygousclones.Seedlingsaregrowninagreenhouse,andoneor
moretubersfromeachplantareretainedforsubsequentvegetativepropagation.Crossing
andseedlingtuberproductiontake1–2years,dependingonthebreedingprogram,
followedby1–2yearsoffieldselectionbasedprimarilyonvisualassessmentoftuber
appearance,plantmaturity,andyieldcomponents(tubernumberandsize),withsome
post-harvestevaluationforprocessingmarkettypes,suchasfrycolorandspecificgravity.
Quantitativemeasurementofthesetraitsinreplicatedand/ormulti-locationtrialsbegins
infieldyear(FY)threeorfourandcontinuesforseveralyears.Becauseittakes3–4years
toestablishclonesasdisease-freeplantletsinvitroandproducefoundationseed,new
varietiesaretypicallyreleased10–12yearsaftercrossing,bywhichtimedozensoftraits
havebeenevaluated.Thedurationofthepotatobreedingcycle,fromsexualhybridization
toincorporatingnewclonesasparents,isshorterthanthetimetovarietyreleasebutstill
5–7yearsformostUSprograms.Untilnow,USbreedingprogramshaveprimarilyused
phenotypicselectionincombinationwithgeneticmarkersforahandfulofmajorresistance
genes(Lopez-Pardoetal.2013).
Theuseofphenotypemeans(orBLUPs)forparentselectionisnotidealbecausethe
estimatescontainadditiveandnon-additivegeneticeffects,butonlytheformerare
5
efficientlytransmittedtooffspring(Gallais2003).Anumberofpreviousstudieshave
investigatednon-additivegeneticvarianceinpotatousingfactorialmatingdesignsto
estimategeneralandspecificcombiningabilities(Plaistedetal.1962;Tai1976;Brownand
Caligari1989;Maris1989;Neeleetal.1991;Gopal1998;Bradshawetal.2000).General
combiningability(GCA)isequivalenttothecovariancebetweenhalf-sibs,whichequals
!"V$ +
!!&V$$ +
!'&V( + ⋯forautotetraploidlociinpanmicticandlinkageequilibrium
(Kempthorne1955c;Gallais2003);thesymbolsVa,Vd,andVaaarethegeneticvariancesfor
additive,digenicdominance,andadditivexadditiveepistasiseffects,respectively.Specific
combiningability(SCA)isthecovariancebetweenfull-sibs*!+V$ +
!"V$$ +
+,V( +⋯-minus
twicethecovariancebetweenhalf-sibs,whichleadstoanexpressioncontainingonlynon-
additivegeneticvariances*!&V( +
!.V$$ +⋯-.TheSCA/GCAratiothereforeprovidesan
indicationoftheimportanceofnon-additivegeneticvariance.Althoughawiderangeof
valuesforSCA/GCAisfoundintheaforementionedreferencesforpotato,thegeneral
conclusionisthatnon-additivegeneticvarianceisimportantinmanycontexts.
Forpedigreedpopulations,analternativeapproachtoestimatingadditivegenetic
varianceisviathenumeratorrelationship,orA,matrix.Kerretal.(2012)werethefirstto
publishacomplete,recursivealgorithmforAinautotetraploids,whichhasbeenappliedto
potato(Slateretal.2014)andblueberry(Amadeuetal.2016)populations.Mixedmodel
analysiswithAalsoenablesselectiononadditivevaluescalculatedbybestlinearunbiased
prediction,orBLUP(Henderson1975).Althoughpedigree-BLUPisthecornerstoneof
geneticimprovementforquantitativetraits,themethodhasseverallimitations:(i)it
6
dependsonaccuratepedigreerecords,(ii)itneglectsgeneticcovariancebetweenfounders,
and(iii)thecovarianceisbasedonexpected,ratherthanrealized,parentalcontribution.
Genomicselection(GS)hasthepotentialtoovercometheselimitationsbyreplacing
Awithagenomiccovariance,orG,matrixestimatedfrommarkers(Bernardo1994;Nejati-
Javaremietal.1997;Habieretal.2007;VanRaden2008)orbyestimatingmarkereffects
directly(Meuwissenetal.2001).TherehavebeenseveralstudiesonGSinautotetraploid
species(Lietal.2015;Annichiaricoetal.2015;Slateretal.2016;Habyarimanaetal.2017;
Sverrisdóttiretal.2017),butnonehaveusednon-additivegenomiccovariancematricesto
partitiongeneticvarianceorpredicttotalgenotypicvalue.Bothtopicsareaddressedinthis
manuscript,buildingonanalogousstudiesatthediploidlevel(Suetal.2012;Xu2013;
Vitezicaetal.2013;Muñozetal.2014;JiangandReif2015)andtheclassicaltheoryof
averageeffectsintetraploids.
THEORY
Inaseriesofpapersin1955(Kempthorne1955a,b,c),whichwerefurtherdistilled
inthemonographAnIntroductiontoGeneticStatistics(Kempthorne1957),Kempthorne
developedthetheoryofaverageeffectsforarbitraryploidy,drawinguponthesame
mathematicalmethodsusedintheanalysisofvarianceforfactorialexperiments.Key
resultsfromthisliteraturearereproducedhere,aswellasdetailsonthederivation
omittedbyKempthorne.
Foranautotetraploidlocusinpanmicticequilibrium,assumingrandombivalent
formation(i.e.,randomchromosomesegregation)andnoinbreeding,thegenotypicvalue
(gijkl)ofgenotypeijkl(eachindexrangesfrom1tothenumberofalleles,andpermutations
7
oftheindicesaredistinct)canbeorthogonallydecomposedintothepopulationmean(µ)
plusfouradditiveeffects(ai)correspondingtothefourgenes,sixdigenicdominance
effects(bij)forallpairsofgenes,fourtrigenicinteractions(gijk),andonequadrigenicterm
(dijkl):
/0123 ≡ 50123 − 7 = 90 + 91 + 92 + 93 + :01 + :02 + :03 + :12 + :13 + :23 +
<012 + <013 + <023 + <123 + =0123 (1)
Eq.1usesstandardnotationfromtheanalysisoffactorialexperiments,inwhichthe
symbolsdenotetheregressioncoefficients,andtheregressorsareimpliedtobeindicator
variables.Focusingontheadditiveeffects,andgroupingtheotherparametersintoa
residualterm,Eq.1becomesaregressionofgenotypicvalueonalleledosage(Fisher
1941).Theaverageeffectsminimizethesumofsquaredresidualsforthepopulation,which
isequivalenttoasumovergenotypesweightedbygenotypefrequencypijkl(=pipjpkplunder
theassumptionsofthemodel):
>?0123@/0123 − A90 + 91 + 92 + 93BC+
0123
(2)
Takingthederivativewithrespecttotheadditiveeffectforeachalleleandequatingthe
resulttozerogeneratesasetofnormalequations,whichcanbesolved(Supporting
Methods,FileS1)toproducethefollowinglinearconstraint:
0 =>?0900
(3)
whichisidenticaltotheresultfordiploids.SubstitutingEq.3intothenormalequations
(Eq.S3,FileS1),thesolutionfortheadditiveeffectofanallelebecomestheaverage
8
genotypicvalueofallindividualswiththatallele(multipledosescontributeseparately),
relativetothepopulationmean:
90 =>?1?2?3/0123123
= 5̅0⋅⋅⋅ − 7 (4)
Theresidualsfromtheregressionequationfortheadditiveeffects,whichwedenote
byG0123 ≡ 50123 − A7 + 90 + 91 + 92 + 93B,areknownasthedominancedeviation.In
diploidsthisdeviationuniquelydefinesonedominanceeffectforeachgenotype,butin
tetraploidsthedominancedeviationiscomposedofdigenic,trigenic,andquadrigenic
effects(Eq.1).Thetetraploidsolutionforthedigeniceffectscorrespondstoregressingthe
dominancedeviationonthedosageofpairsofalleles,whichinvolvesminimizingthe
followingsumofsquaredresiduals:
>?0123@G0123 − A:01 + :02 + :03 + :12 + :13 + :23BC+
0123
(5)
Takingthederivativewithrespecttothedigeniceffectforeachallelepair,andequatingthe
resulttozero,generatesasetofnormalequationsthatcanbesolved(SupportingMethods,
FileS1)toproducethefollowinglinearconstraintforanyallelei:
0 =>?2:022
(6)
whichisthesameresultfordiploids.SubstitutingEq.6intothenormalequations(Eq.S6,
FileS1)leadstothesolution
:01 = >?2?3G012323
= 5̅01⋅⋅ − 7 − 90 − 91 (7)
Bynowthepatternisclear,andtheleast-squaressolutionforthetrigeniceffectscanbe
writtenas
9
<012 = 5̅012⋅ − 7 − 90 − 91 − 92 − :01 − :02 − :12 (8)
Havingsolvedfortheadditive,digenic,andtrigenicterms,thequadrigeniceffectdijklisthe
residualinEq.1.
Thebreedingvalue(BV)ofanindividualisdefinedastwicethemeangenotypic
valueofitsprogenyrelativetothepopulationmean.Underthemodelassumptions,allsix
possiblegenepairsfortetraploidgenotypeijklhaveequalfrequencyinitsgametes,which
inconjunctionwithEq.7leadstothefollowingexpression:
BV0123 = +I!&AJKLM⋅⋅NJKLO⋅⋅NJKLP⋅⋅NJKMO⋅⋅NJKMP⋅⋅NJKOP⋅⋅BQRS
= A90 + 91 + 92 + 93B +!'A:01 + :02 + :03 + :12 + :13 + :23B = T + !
'U
(9)
Eq.9showsthatbreedingvalueequalsthetotaladditivevalue(u)plus1/3ofthetotal
digenicdominance(v),butitisconventionaltorefertotheadditivevalueas“breeding
value”becausethecontributionofdigenicdominancediminishesexponentially:1/3taftert
generations(Gallais2003).Thisisanalogoustothesituationindiploids(andpolyploids)
withregardtoadditivexadditiveepistasis,asitcontributestobreedingvaluewitha
coefficientof½butisgenerallyomittedwhenreferringto"breedingvalue."
MATERIALSANDMETHODS
Trainingpopulation
Phenotypedataforatrainingpopulation(TP)of571roundwhitecloneswas
collectedbetween2012and2017attheUniversityofWisconsin(UW-Madison)Hancock
AgriculturalResearchStation(numberofclonestrialedperyearinTableS1,FileS1).
Between2012and2015,allcloneswereentriesintheNationalChipProcessingTrial
10
(NCPT),contributedby11publicUSbreedingprograms.In2016and2017,FY3andFY4
selectionsfromtheUW-MadisonbreedingprogramwereincludedinadditiontotheNCPT
clones.TheNCPTusesatwo-tierevaluationsystem,withoneplotperlocationforTier1
clonesandtwoplotsperlocationforTier2clones.FY3cloneswereevaluatedwithasingle
plot,andFY4cloneswereevaluatedwithtwoplotsin2016andoneplotin2017.Allplots
contained15seedpieces,plantedwith30cmin-rowspacingand91cmbetweenrows.
TrialswereplantedinlateAprilandharvestedinearlySeptember,withvinedesiccation2–
3weeksbeforeharvest.
Phenotypedataforthreetraits—yield,specificgravity,andfrycolor—areincluded
inthisstudy.TotalyieldisbasedontheweightofallharvestedtubersandreportedinMg
ha-1.Specificgravitywasdeterminedbywaterdisplacement,using2–3kgoftubersper
plot(Wangetal.2017).FrycolorwasmeasuredinMarchofeachyearafter6monthsof
storage(1monthat12.8°Cforwoundhealing,followedby5monthsat8.9°C),using1mm
slicesfriedfor130sinvegetableoilat182°C.Forthe2012–2014trials,frycolorwas
measuredona1-10visualscale,whileforthe2015–2017trialsitwasmeasuredontheL*
lightnessscaleusingtheD25HunterLabcolorimeter(HunterAssociatesLaboratory,Inc.,
VirginiaUSA).Frycolormeasurementsonthevisualscale(x)wereconvertedtoL*using
theformulaL*=-1.37x+63.7,whichisbasedonalinearregressionanalysisof70clones
phenotypedwithbothmethods.
TPsamplesweregenotypedwitheitherversion1orversion2oftheSolCAPpotato
SNParray,whichhaveincommonasetof8303markersusedforthisstudy(Hamiltonetal.
2011;Felcheretal.2012).Tetraploidgenotypecalls(coded0–4)weremadeusingversion
1.6oftheClusterCallpackage(SchmitzCarleyetal.2017)inR(RDevelopmentCoreTeam
11
2015),whichcalibratestherelationshipbetweensignalintensityfromthearrayandallele
dosageforeachmarkerbasedonmultipleF1populations.InadditiontotheAtlanticx
Superior,WauseonxLenape,andRioGrandexPremierRussetpopulationsusedby
SchmitzCarleyetal.(2017),twomorecalibrationpopulationswereused:WanetaxPike
(daSilvaetal.2017;n=184)andA06084-1TExCastleRusset(n=245).Default
parameterswereusedexceptformin.train=3,whichrequiredamarkertobecalledinat
least3ofthe5calibrationfamilies.Thecuratedmarkersetcontained3895polymorphic
SNPswith³95%concordanceacrosssamples(FileS2).
ThenumeratorrelationshipmatrixAwascalculatedwithRpackageAGHmatrix
(Amadeuetal.2016),usingpedigreerecordsmaintainedbytheauthorsaswellasapublic
database(vanBerlooetal.2007).Afterremovinguninformativeancestors,therewere185
founders(cloneswithnoparent)and1138totalclonesinthepedigree(FileS3).
Genomiccovariancematrices
IntheTHEORYsection,averageeffectswerederivedforanarbitrarynumberof
alleles.Forbi-allelicSNPs,additionalsimplificationsarepossible.ConsiderallelesBandb
withfrequenciespandq,respectively.InthiscaseEq.3becomes?9V + W9X = 0,which
reducestothefollowingwell-knownformulasinvolving9 ≡ 9V − 9X:
9V = W9
9X = −?9 (10)
IfXdenotesthedosageofB,thenthetotaladditivevalueis
T = Y9V + (4 − Y)9X = (Y − 4?)9 ≡ ]9 (11)
12
whereWisacenteredgenotypebecause4pisthepopulationmeanofX.Toobtaina
similarlyparsimoniousexpressionforthetotaldigenicdominancev,wefollowtheexample
ofWrickeandWeber(1986)andintroducetheparameter: ≡ :VV − 2:VX + :XX .When
combinedwithEq.6,theresultis(SupportingMethods,FileS1)
:VV = W+:
:VX = −?W:
:XX = ?+:
(12)
U = _6?+ − 3?Y + !+Y(Y − 1)c: ≡ d: (13)
Formixedmodelanalysis,Gijquantifiesthecovariancebetweentheadditivevalues
forclonesiandj,relativetotheadditivegeneticvariance ef+:
g01 = efQ+covk@T0, T1C = efQ+covk@]09,]19C = efQ+]0]1vark[9] = efQ+]0]1ek+ (14)
ThecovarianceinEq.14involvestheexpectationwithrespectto9~r(0, ek+),butthe
additivegeneticvarianceef+isbasedonthetheoryofaverageeffects,inwhichthe
expectationiswithrespecttogenotypes.Torelatethetwovariancecomponents,the
expectationwithrespecttobothparametersisused:
ef+ = sk,t[T+] − sk,t[T]+ = sk[9+]st[]+] − sk[9]+st[]]+ = 4?Wek+ (15)
UponsubstitutingEq.15intoEq.14,andextendingtheanalysistomlociinlinkage
equilibrium,theresultis
g01 =∑ ]02]12v2w!∑ 4?2W2v2w!
⇒ y =zz{
∑ 4?2W22 (16)
ThedigenicdominancematrixDijisdefinedsimilarly,asthecovariancebetween
dominancevaluesrelativetothedominancegeneticvariance,basedontheexpectation
withrespectto:~r(0, e|+):
13
}01 = e~Q+cov|@U0, U1C = e~Q+cov|@d0:, d1:C = e~Q+e|+d0d1 (17)
Usingthefollowingresultforthedominancegeneticvariance:
e~+ = s|,t[U+] − s|,t[U]+ = e|+st[d+] = 6?+W+e|+ (18)
theDmatrixis
� =ÄÄ{
∑ 6?2+W2+2 (19)
Forthecovariancebetweenadditivexadditiveepistaticeffects,weusedtheHadamard
productoftheGmatrix,denotedbyG#G(Henderson1985;Suetal.2012).
Mixedmodelanalysisofthetrainingpopulationdataset
Stage-wiseanalysisofthemulti-yearTPdatasetwasperformedusingASReml-R
version3(Butleretal.2009)andadiagonalweightmatrixtoaccountforthevarying
precisionoftheestimatesinthefirststage(Smithetal.2001;Damesaetal.2017).Stage
Onewasananalysiswithinyear,includingblockingeffectswhenpresentandmodelingthe
genotypeeffectforeachcloneasfixed.Thecovariancematrix(Å1)forthevectorof
genotypeeffectestimates(ÇÉ1)inyearjwasobtainedfromtheinverseofthecoefficient
matrixofthemixedmodelequations(Henderson1975),whichisreturnedasCfixedinthe
asremlobject.StageTwowasamulti-yearanalysisbasedonthefollowinglinearmodel:
7̂01 = Ö + 50 + G1 + (5G)01 + Ü01 . (20)
InEq.20theparameterfistheintercept,giisarandomeffectforgenotype,yjisafixed
effectforyear,(gy)ijisarandomeffectforthegenotypexyearinteraction,andthevariance
oftheresidualfijis(wij)-1,wherewijisthei-thdiagonalelementofÅ1Q!fromStageOne
14
(Damesaetal.2017).FileS4containsthegenotypeeffectestimates(7̂01)and
correspondingweights(wij)usedinthemulti-yearanalysis.
Afterfittingabaselinemodelwithindependentgenotypeeffects(var[gi]=Vg),five
geneticmodelswithdifferentcovariancestructuresforgiweretested(Table1).The
varianceofthegenotypexyearinteraction(var[gyij]=Vgy)wasestimatedinthebaseline
modelandconstrainedatthatvaluefortheothermodels.Thisallowedforthepartitioning
ofthegeneticvariance(Vg)intoadditive(Va)andresidual(Vr)geneticcomponentsfor
modelsAandG.ModelsG+GG,G+D,andG+GG+Dinvolvedtheestimationofnon-additive
variancecomponentsfordigenicdominance(Vd)and/oradditivexadditiveepistasis(Vaa).
VariancesarereportedusingthestandardizationproposedbyLegarra(2016)tomake
themcomparable;forcovariancematrixK,theparameterestimateismultipliedbythe
differencebetweenthemeanofthediagonalelementsandthemeanofallelements:
Aáà00 − áà01B.Goodness-of-fitwasassessedbytheAkaikeInformationCriterion(AIC),defined
asthedevianceminustwicethenumberofvarianceparameters(Piepho2009).
Ourobjectivewastocomparehowwellthedifferentcovariancemodelspredicted
thetotalgenotypicvaluegofunobservedclones.EachofthemodelsinTable1hasthe
form5 = â + ä,whereq isasumofaverageeffectsandristheresidualgeneticeffect.
GenomicpredictionswerecalculatedasBLUP[â] ≡ âéfromaStageTwoanalysis(Eq.20)
withoutresponsevalues(7̂)forclonesinthevalidationset,usingthevarianceparameter
estimatesfromASReml-RandcustomscriptstosolveHenderson’smixedmodelequations
(Henderson1975).ThevalidationdatawerecalculatedasBLUP[5] ≡ 5èfromaStageTwo
analysiswithallclones,assumingindependentcloneeffects(i.e.,thebaselinemodel).The
reliability(äJJè+ )ofthevalidationdatawascalculatedfromthepredictionerrorvariance
15
(PEV)by1 − PEV0/VJforclonei(Clarketal.2012).Toestimatepredictionaccuracy(äíìJ)
frompredictiveability(i.e.,thePearsoncorrelationbetweenthegenomicpredictionsand
validationdata,äíìJè),thelatterwasdividedbythesquare-rootofthemeanreliability(i.e.,
broad-senseheritability)ofthevalidationdata(Dekkers2007).Becausethemean-squared
erroroftheaccuracyestimateworsensasthereliabilityofthevalidationdatadecreases
(Estaghvirouetal.2013),onlyvalidationdatawithreliability³0.6wereused.
Genome-widepredictioninF1populations
Aspartofvariousresearchprojects,sixunselectedF1populations(Table2)were
evaluatedatthesamelocationastheTPduringthesametimeperiod.Populations
W12011,W12012,andW12060wereevaluatedforyieldandspecificgravitywithasingle
plotof12plantsperclonein2015and2016.PopulationsW9817andW10010were
evaluatedforyieldwithtwo8-plantplotsin2013andone20-plantplotin2014(Raketal.
2015).PopulationWxLwasevaluatedwithasingleplotforspecificgravityforfouryears
(2012–2015),butyieldwasonlymeasuredin2014and2015;therewere6plantsperplot
in2012and10plantsperplotin2013–2015(Frederick2017).Phenotypedatawere
analyzedseparatelyforeachF1population,usingalinearmodelwithfixedeffectsforyear
andindependentrandomeffectsforclone.Geneticandresidualvariancecomponentswere
estimatedwithASReml-RandusedtocalculateBLUPsforvalidation.TheBLUPsand
correspondingreliabilities(whichwereusedtoestimateaccuracyfrompredictiveability,
asdescribedabove)areprovidedinFileS5,exceptthatyieldBLUPsforW12011and
W12060wereexcludedbecauseoflowreliability(<0.6).
16
TheF1populationsweregenotypedusingthesameSNParrayasthetraining
population.Tetraploidgenotypecalls(coded0–4)weremadetwoways:(i)asdescribed
above(usingCC.anypopintheClusterCallpackage),forthesamesetof3895markers
selectedfortheTP,and(ii)usingtheCC.bipopfunction.Polymorphicmarkerswith
identicalcallsforthetwoapproacheswereusedforprediction(FileS6).Genomic
predictionsforeachF1populationwerecalculatedasBLUP[â]fromaStageTwoanalysis
oftheentireTP(Eq.20),usingtheTPvarianceestimateswithmulti-populationcovariance
matrices(“G_New”fromWientjesetal.2017)toaccountfordifferentallelefrequenciesin
theTPvs.F1population.Ifzî andÄîdenotescaledversionsofthematricesdefined
previously:
zî =z
ï∑ 4?2W22Äî =
Äï∑ 6?2+W2+2
(21)
then,usingthesubscripts1and2todenotethedifferentpopulations,thecovariance
matricesare
y = ñzî!zî!{ zî!zî+{
zî+zî!{ zî+zî+{ó � = ñÄ
î!Äî!{ Äî!Äî+{
Äî+Äî!{ Äî+Äî+{ó (22)
Dataavailability
Allmarker,pedigree,andphenotypedataneededtoreproducetheresultsinthis
manuscriptareprovidedinSupplementalFilesS2–S6:
FileS2,Markerdataforthetrainingpopulation(.csv)
FileS3,Pedigreetableforthetrainingpopulation(.csv)
FileS4,ClonemeansbyyearforStageTwoanalysisofthetrainingpopulation(.csv)
FileS5,CloneBLUPsforthesixF1populations(.csv)
17
FileS6,MarkerdatafortheF1populations(.csv)
RESULTS
Pedigreeinformationwasusedtocalculatethenumeratorrelationship,orA,matrix
foratrainingpopulation(TP)of571clones.Allbut13cloneshadapedigreedepth
(maximumnumberofgenerationsfromafounder)ofatleast7,withamediandepthof10
generations(distributioninFig.S1,FileS1).Forautotetraploids,diagonalelementAiiis
relatedtotheinbreedingcoefficientFi—definedastheprobabilitythattworandomly
chosengenes,sampledwithoutreplacement,areidenticalbydescent—viatheequationAii
=1+3Fi(Gallais2003).ValuesforAiirangedfrom1to1.55,withameanof1.05,sothemost
inbredcloneinthepopulationhadFi=0.18.
IntheG-BLUPmethodofgenomicselection,thecovariancebetweenadditivevalues
isproportionaltoaGmatrixcalculatedfrommarkers(insteadofA).WecalculatedGusing
3895polymorphicmarkersfromthepotatoSNParray,forwhichaccuratealleledosage
informationwasavailable.TheoverallscalingofGissuchthatthemeanofthediagonal
elementsequals1atpanmicticequilibrium,whichisveryneartheobservedvalueof0.99.
Asfurtherconfirmation,theobservedfrequencyofheterozygoteswasinspectedasa
functionofallelefrequencyandfoundtobeincloseagreementwiththeexpectedvalues
underpanmixis(Fig.S2,FileS1).
Ghasbeencalledthe"realizedrelationshipmatrix"becauseitcapturesMendelian
segregationaroundtheexpectedvalueA(Hayesetal.2009).Thisconnectionisthe
motivationforregressionanalysisbetweenGandA(VanRaden2008),showninFigure1
18
forourpotatodataset.Thedashedlineisthefitofthelinearmodelwhenalloff-diagonal
elementsareused(G=0.66A–0.06,R2=0.41),whichunderestimatedGathighvaluesofA.
Byexcludingverydistantrelationships(A<0.05),themodelfitimprovedoverallandatthe
upperend,asshownbythesolidblackline(G=0.79A–0.09,R2=0.51).Theseresultsare
basedontheassumptionofnodoublereduction,i.e.,thatdiploidgametesdonotcontain
genesfromsisterchromatids,butinpotatotheprobabilityofdoublereductionvariesfrom
0atthecentromeretoashighas0.07atthetelomere(Bourkeetal.2015).Inthecontextof
apolygenictrait,withgenesdistributedacrosstheentirechromosome,theeffectivevalue
ofthedoublereductionparameterisexpectedtobesmall.Whentheprobabilityofdouble
reductionwasincreasedto0.05forcomputingA,thegoodness-of-fitandinterceptforthe
regressionwereunaffectedbuttheslopedecreasedfrom0.79to0.73.
Figure2comparestheoff-diagonalelementsofthedigenicdominancecovariance
matrixDagainstthecorrespondingelementsofG.Forcloserelationships,Gij³0.4,GandD
werehighlycorrelated(r=0.81),butwhenthesepairswereexcludedthecorrelation
droppedto0.08.TheoverallscalingofDissuchthatthemeanofitsdiagonalelements
equals1atpanmicticequilibrium,whichisveryneartheobservedvalueof0.99.
Apotentialconcernwhenusinggenomicrelationshipmatricesisthatestimatesof
geneticvariancemaybetoolowduetoincompletemarker-QTLLD.Yangetal.(2010)
presentedamethodtoassess(andcorrect)thisissue.First,themarkersarerandomly
partitioned,suchthatonehalfrepresentQTLandtheotherhalfmarkers;thenG(orD)is
calculatedusingeachhalfseparately;andfinallyGQTLisregressedontoGmark.Forboththe
GandDmatricesinourdataset,themeanregressioncoefficientafter100iterationswas
19
1.00(SD0.01),indicatingsufficientmarkerdensityundertheassumptionthatmarkersare
sampledfromthesamedistributionasQTL.
Variancecomponents
Phenotypedataforthreeeconomicallyimportanttraitswereanalyzed:totalyield,
specificgravity(asaproxyfordrymattercontent),andpotatochipfrycoloraftersix
monthsofstorage.Initially,genotypeeffectsweremodeledasindependenttoestimatethe
totalgeneticvariance(Vg)andthevarianceofthegenotypexyearinteraction(Vgy).TheVg
estimatewashigherthanVgyforalltraits,rangingfrom2.7timeshigherforyieldto5.6
timeshigherforspecificgravity(Table3).
Byaddinganotherrandomeffecttothebaselinemodel,withcovariance
proportionaltoAorG,thegeneticvariance(Vg)waspartitionedintoadditive(Va)and
residualgeneticvariance(Vr),thelattercorrespondingtotheindependentcloneeffect
(Table1).BothAandGloweredtheAICcomparedtothebaselinemodelforalltraits,with
theGmatrixproducingabetterfitforyieldandfrycolorvs.theAmatrixforspecific
gravity(Figure3).UsingA,theproportionofgeneticvarianceduetoadditiveeffectswas
0.52forspecificgravity,0.59foryield,and0.76forfrycolor(Figure4).WhenGwasused,
theadditivegeneticvarianceestimateswerereducedby0.12–0.18ofthetotalgenetic
variance,dependingonthetrait.
Forspecificgravityandfrycolor,includingadditivexadditiveepistasisloweredthe
AICcomparedtotheadditiveG-BLUPmodel,butwhendominancewasincludedtheAIC
increased(Fig.3).Forthesetraits,asubstantialamountoftheestimatedadditivevariance
intheGmodelbecameadditivexadditiveepistasisintheG+GGmodel:Vadroppedfrom34
20
to20%ofVgforspecificgravityandfrom63to45%forfrycolor,with44–51%ofthe
geneticvariancecapturedbyG#G(Fig.4,standarderrorsinTableS2,FileS1).Foryield,
neitherdominancenoradditivexadditiveepistasisimprovedtheAICcomparedtoG-BLUP,
andonly10%ofthegeneticvariancewascapturedbyDorG#Gcomparedto45%forthe
residualcloneeffect.
Predictionaccuracy
Manystudiesuserandomcross-validationtoassesstheaccuracyofgenome-wide
prediction.However,inthecontextofapedigreedbreedingpopulation,thisapproachleads
totrainingsetindividualsthataredescendantsofindividualsinthevalidationset,whichis
unrepresentativeofhowgenomicselectionwillbeusedinpracticeandmayproduce
unrealisticallyhighaccuracies.Toavoidthispitfall,weusedthepedigreedepthmetricto
partitionthepopulationintoasetof168candidatesforselection(depth>=12)andasetof
403clonesancestraltothisgroup(depth<12)asthetrainingset.Theselectioncandidates
werefurthernarrowedbyexcludingcloneswithinsufficientlyreliabledataforvalidation,
leaving54clonesforyield,132clonesforspecificgravity,and49clonesforfrycolor(with
meanreliabilityintherange0.71–0.72foralltraits).
Figure5showstheaccuracy(left)andregressioncoefficient(right)whenusing
eachofthemodelsinTable1topredicttotalgenotypicvalueinthevalidationset.
PredictionaccuracyusingonlytheAmatrixwasjustover0.5fortotalyieldvs.0.4for
specificgravityandfrycolor.ReplacingAwithGimprovedaccuracyforyieldby0.03and
frycolorby0.06butdecreasedtheaccuracyforspecificgravityby0.07,whichisconsistent
withthetrendobservedforAIC.Includingdominanceimprovedyieldaccuracyby0.01,and
21
includingepistasisimprovedspecificgravityaccuracyby0.05.BasedontheAICresultsfor
frycolor,weexpectedhigheraccuracywiththeG+GGmodel,butthiswasnotobserved.
Includingepistasisreducedpredictionbiasforfrycolorandspecificgravity,with
regressioncoefficientsover0.9fortheG+GGmodelcomparedto0.75–0.78forG-BLUP.
Toinvestigatetheeffectoftrainingpopulationsizeonaccuracy,200random
subsetsoftheTPweretakenatN=100,200,and300clones(Fig.S3,FileS1).Forallthree
traits,predictionaccuracydecreasedastheTPwasreduced.Frycoloraccuracywasthe
mostsensitive,droppingby0.28whenpopulationsizewasreducedfrom403to100,
comparedtoaccuracydecreasesof0.14and0.19foryieldandspecificgravity,respectively.
Wealsodeterminedaccuracywhenusingtheentire(N=571)trainingpopulationto
predictyieldandspecificgravityinsixunselectedF1populations(Table4).TheF1
populationsrangedinsizefrom48to167clones,andthenumberofpolymorphicmarkers
rangedfrom1376to2311(Table2).Severaloftheparentshadlittlepedigreerelationship
totheTPbecausetheywererussetclones,whichisadistinctmarketcategoryfromthe
roundwhite,chipprocessingtype.Predictionaccuraciesrangedfrom0.06to0.63withG-
BLUP,withnodiscernibleconnectionbetweenaccuracyandpedigreerelationshiptothe
TP.TheG+GG+Dmodelperformedverysimilarly,withnodifferenceinaverageaccuracy
(totwodecimalplaces)acrosstheeightcasesinTable4.Toassessthevalueoftetraploid
alleledosage,predictionsweremadewith“diploidized”markerdata(Gdip),inwhichthe
threeheterozygoteswererecodedtobeidentical.TheaccuracyoftheGdipmodelwas
consistentlylower,withanaveragelossof0.13.
22
DISCUSSION
Thisisthefirststudytoconnecttheclassicaltheoryofgeneticvariancepartitioning
intetraploidswithcovariancematricesconstructedfromgenome-wide,alleledosage
information.Ourresultsarebasedonatwo-stageanalysisinwhichthegenotypeestimate
foreachclonexenvironmentcombinationwascalculatedinStageOneassuming
independenteffects,andinStageTwogenomiccovariancematriceswereused.Wenote
thatnotallreferencestotwo-stageanalysisintheliteratureemploythisconvention;often
asinglegenotypemeanacrossallenvironmentsisestimatedinthefirststage.Inour
datasettherewere849genotypexyearmeansestimatedinStageOneforthe571clones
(FileS4),andthispartialreplicationisexpectedtoimprovetheprecisionofthevariance
componentestimatescomparedtoanalyzingasinglemeanpergenotypeinStageTwo
(Kruijeretal.2015).
Thepartialreplicationacrossyearsalsoallowedforexplicitmodelingofaresidual
geneticeffectwithnocovariancestructure,inadditiontotheadditive,dominance,and
epistaticrandomeffects.Weinterprettheresidualgeneticeffecttoincludehigherorder
non-additiveeffects(e.g.,trigenicdominance,additivexdominanceepistasis)andgenetic
variancenotcapturedbythemarkers.Thelattermightappeartoberuledoutbasedonour
analysisoftheGandDmatricesusingtheYangetal.(2010)method,butthisassumesQTL
aredrawnfromthesamedistributionasthemarkers.Inreality,low-frequencyallelesare
under-representedonthepotato8303SNParrayandareexpectedtocontributeresidual
geneticvariance(Vosetal.2015).TheestimatesforVdandVaaweresensitivetowhether
theresidualgeneticeffectwasincludedinthemodel.Withoutit,Vaaforyieldwasestimated
at40(SE14)Mg2ha-2,whichis45%ofthetotalgeneticvariance(88Mg2ha-2).Whenthe
23
residualeffectwasincluded,mostofthisvarianceshiftedintoVr.Thisphenomenonandthe
largestandarderrorsoftheestimates(TableS2,FileS1)suggestthatthepartitioningofthe
non-additivegeneticvarianceisuncertain,probablybecauseoflimitedpopulationsize.
Atfirstglance,ourfindingthatmoreofthegeneticvariancewasadditiveforyield
(45%)comparedtospecificgravity(20%)seemsunexpected.Diallelstudieshavetypically
foundtheratiobetweenspecificandgeneralcombiningability(SCA/GCA)tobehigherfor
yield.Taietal.(1976)reportedSCA/GCA=3.8foryieldvs.0.6forspecificgravity,and
Bradshawetal.(2000)reportedSCA/GCA=0.75foryieldvs.0.06forspecificgravity.
Whereastheseearlierstudiesusedunselectedpopulations,theclonesinourtraining
populationhadbeenselectedforhighspecificgravityatleastonce,andinmanycasesfor
2–3years.Specificgravity(whichiscloselycorrelatedwithdrymattercontent)isoneof
themostimportanttraitsforthechipprocessingmarket,andstrongselectionearlyinthe
varietydevelopmentprocessispracticedbecausethetraitshowsrelativelylittlegenotype
xenvironmentinteraction(Table3;Wangetal.2017).Ourresultssuggestthatinthetailof
thephenotypedistributionforspecificgravity,thepartitioningofgeneticvarianceis
shiftedtowardnon-additiveeffects.
Auniquefeatureofourstudycomparedtopreviousreportsofgenomicselectionin
autotetraploidspecies,suchasalfalfa(Lietal.2015;Annichiaricoetal.2015)andpotato
(Habyarimanaetal.2017;Sverrisdóttiretal.2017),wastheuseofSNParraydatarather
thangenotyping-by-sequencing(GBS).AmajorbenefitoftheSNParrayforpolyploidsis
theabilitytoaccuratelydeterminealleledosage(Voorripsetal.2011;SchmitzCarleyetal.
2017),butthecostofthearrayisdeterminedbysalesvolumeandcanbeprohibitively
expensiveforgenomicselectioninsmallbreedingprogramsorminorcrops.GBSmethods
24
achievelowper-samplecostsbypoolingmanysamplesintoonelibraryforsequencing,but
muchhigherreaddepth(persample-markercombination)isneededforaccurategenotype
assignmentintetraploidscomparedtoheterozygousdiploids.BycomparingGBSwith
KASPmarkersinpotato,Uitdewilligenetal.(2013)recommended60–80Xreaddepthto
differentiatethethreeheterozygousgenotypes,whichagreeswellwiththeoretical
calculations(Endelman,unpublished).Forgenomicselection,apressingquestionis
whetherpayingformoresequencingtoimproveestimatesofalleledosageprovidesa
returnoninvestment,intermsofpredictionaccuracyandultimatelygeneticgain.A
completelygeneralanswermaybeelusiveduetocomplexinteractionsbetweenGBS
method,population,andphenotype,butourfindingthat“diploidization”ofthemarker
dataconsistentlyreducedpredictionaccuracyintheF1populations(by0.13onaverage)
highlightstheneedforfurtherresearch.
Inplantbreeding,boththetotalgenotypicvalueandadditivevaluearerelevantfor
selection.Formanycrops,theunitofcommercialproduction(i.e.,inbred,F1hybrid,or
vegetativeclone)isthesamegenotypeevaluatedbythebreeder,andthereforeselection
shouldbebasedontotalgenotypicvalue.Whenselectingnewparents,however,onlythe
additivevalueshouldbeconsideredbecausenon-additiveeffectsarelessefficiently
transmittedtoprogeny.Wehavedemonstratedthefeasibilityofthisparadigmforpotato
usingtheG+GG+Dmodel,althoughquestionsremainregardingitsoptimalimplementation.
Formanyissues,furtherprogresswillrequirelargerpopulationsgenotypedwithless
ascertainmentbias.
25
Acknowledgments
FinancialsupportwasprovidedbyPotatoesUSA,theUSDANationalInstituteofFoodand
Agriculture,AwardNumber2014-67013-22418,andUSDAHatchProjects1002731and
1013047.
Authorcontributions
JBEdesignedthestudy.JBE,CASCanalyzedthedataanddraftedthemanuscript.All
authorscontributedgermplasm,data,orfinancialresources.
LiteratureCited
Amadeu,R.R.,C.Cellon,J.W.Olmstead,A.A.Garcia,M.F.Resendeetal.,2016AGHmatrix:
Rpackagetoconstructrelationshipmatricesforautotetraploidanddiploidspecies:a
blueberryexample.PlantGenome9:3.
Annichiarico,P.,N.Nazzicari,X.Li,Y.Wei,L.Pecettietal.,2015Accuracyofgenomic
selectionforalfalfabiomassyieldindifferentreferencepopulations.BMCGenomics16:
1020.
Bernardo,R.,1994Predictionofmaizesingle-crossperformanceusingRFLPsand
informationfromrelatedhybrids.CropSci.34:20–25.
Bourke,P.M.,R.E.Voorrips,R.G.F.Visser,andC.Maliepaard,2015Thedouble-reduction
landscapeintetraploidpotatoasrevealedbyahigh-densitylinkagemap.Genetics201:
853-863.
26
Bradshaw,J.E.,D.Todd,andR.N.Wilson,2000Useoftuberprogenytestsforgenetical
studiesaspartofapotato(Solanumtuberosumsubsp.tuberosum)breeding
programme.Theor.Appl.Genet.100:772–781.
Brown,J.,andP.D.S.Caligari,1989Crosspredictioninapotatobreedingprogrammeby
evaluationofparentalmaterial.Theor.Appl.Genet.77:246–252.
Butler,D.G.,B.R.Cullis,A.R.Gilmour,andB.J.Gogel,2009ASReml-Rreferencemanual
Version3.QueenslandDepartmentofPrimaryIndustriesandFisheries,Brisbane.
Clark,S.A.,J.M.Hickey,H.D.Daetwyler,andJ.H.J.vanderWerf,2012Theimportanceof
informationonrelativesforthepredictionofgenomicbreedingvaluesandthe
implicationsforthemakeupofreferencedatasetsinlivestockbreedingschemes.
Genet.Sel.Evol.44:4.
daSilva,W.L.,J.Ingram,C.A.Hackett,J.J.Coombs,D.Douchesetal.,2017Mappinglocithat
controltuberandfoliarsymptomscausedbyPVYinautotetraploidpotato(Solanum
tuberosumL.).G3(Bethesda)7:3587–3595.
Damesa,T.M.,J.Möhring,M.Worku,andH.P.Piepho,2017Onestepatatime:Stage-wise
analysisofaseriesofexperiments.Agron.J.109:845–857.
Dekkers,J.C.M.,2007Predictionofresponsetomarker-assistedandgenomicselection
usingselectionindextheory.J.Anim.Breed.Genet.124:331–341.
Estaghvirou,S.B.O.,J.O.Ogutu,T.Schulz-Streeck,C.Knaak,M.Ouzunovaetal.,2013.
Evaluationofapproachesforestimatingtheaccuracyofgenomicpredictioninplant
breeding.BMCGenomics14:860.
27
Felcher,K.J.,J.J.Coombs,A.N.Massa,C.N.Hansey,J.P.Hamiltonetal.,2012Integrationof
twodiploidpotatolinkagemapswiththepotatogenomesequence.PloSONE7:
e36347.
Fisher,R.A.,1941Averageexcessandaverageeffectofagenesubstitution.Ann.Eugen.11:
53–63.
Frederick,C.M.,2017Explorationofnovelphenotypingtechniquesandidentificationof
quantitativetraitlociforchipprocessingpotatoes(PhDthesis).UniversityofWisconsin,
Madison.
Gallais,A.,2003Quantitativegeneticsandbreedingmethodsinautopolyploidplants.INRA,
Paris.
Gopal,J.,1998Generalcombiningabilityanditsrepeatabilityinearlygenerationsof
potatobreedingprogrammes.PotatoRes.41:21–28.
Habier,D.,R.L.Fernando,andJ.C.M.Dekkers,2007Theimpactofgeneticrelationship
informationongenome-assistedbreedingvalues.Genetics177:2389–2397.
Habyarimana,E.,B.Parisi,andG.Mandolino,2017Genomicpredictionforyields,
processingandnutritionalqualitytraitsincultivatedpotato(SolanumtuberosumL.).
PlantBreeding136:245-252.
Hamilton,J.P.,C.N.Hansey,B.R.Whitty,K.Stoffel,A.N.Massaetal.,2011Single
nucleotidepolymorphismdiscoveryineliteNorthAmericanpotatogermplasm.BMC
Genomics12:302.
28
Hayes,B.J.,P.M.Visscher,andM.E.Goddard,2009Increasedaccuracyofartificial
selectionbyusingtherealizedrelationshipmatrix.Genet.Res.Camb.91:47–60.
Henderson,C.R.,1975Bestlinearunbiasedestimationandpredictionunderaselection
model.Biometrics31:423-447.
Henderson,C.R.,1985Bestlinearunbiasedpredictionofnon-additivegeneticmeritsin
noninbredpopulations.J.Anim.Sci.60:111–117.
Jiang,Y.,andJ.C.Reif,2015Modelingepistasisingenomicselection.Genetics201:759–
768.
Kempthorne,O.,1955aThetheoreticalvaluesofcorrelationsbetweenrelativesinrandom
matingpopulations.Genetics40:153–167.
Kempthorne,O.,1955bThecorrelationbetweenrelativesinasimpleautotetraploid
population.Genetics40:168–174.
Kempthorne,O.,1955cThecorrelationsbetweenrelativesinrandommatingpopulations.
ColdSpringHarb.Symp.Quant.Biol.20:60–78.
Kempthorne,O.,1957Anintroductiontogeneticstatistics.JohnWiley&Sons,NewYork.
Kerr,R.J.,L.Li,B.Tier,G.W.Dutkowski,andT.A.McRae,2012Useofthenumerator
relationshipmatrixingeneticanalysisinautopolyploidspecies.Theor.Appl.Genet.
124:1271–1282.
Kruijer,W.,M.P.Boer,M.Malosetti,P.J.Flood,B.Engeletal.,2015Marker-based
estimationofheritabilityinimmortalpopulations.Genetics199:379–398.
29
Legarra,A.,2016Comparingestimatesofgeneticvarianceacrossdifferentrelationship
models.Theor.Pop.Biol.107:26–30.
Li,X.,Y.Wei,A.Acharya,J.L.Hansen,J.L.Crawfordetal.,2015Genomicpredictionof
biomassyieldintwoselectioncyclesofatetraploidalfalfabreedingprogram.Plant
Genome8:2.
Lopez-Pardo,R.,L.Barandalla,E.Ritter,andJ.I.RuizDeGalarreta,2013Validationof
molecularmarkersforpathogenresistanceinpotato.PlantBreeding132:246–251.
Maris,B.,1989Analysisofanincompletediallelcrossamongthreessp.tuberosum
varietiesandsevenlong-dayadaptedssp.andigenaclonesofthepotato(Solanum
tuberosumL.).Euphytica41:163–182.
Meuwissen,T.H.E.,B.J.Hayes,andM.E.Goddard,2001Predictionofthetotalgenetic
valueusinggenome-widedensemarkermaps.Genetics157:1819-1829.
Muñoz,P.R.,M.F.R.Resende,S.A.Gezan,M.D.V.Resende,G.delosCamposetal.,2014
Unravelingadditivefromnonadditiveeffectsusinggenomicrelationshipmatrices.
Genetics198:1759–1768.
Neele,A.E.F.,H.J.Nab,andK.M.Louwes,1991Identificationofsuperiorparentsina
potatobreedingprogramme.Theor.Appl.Genet.82:264–272.
Nejati-Javaremi,A.,C.Smith,andJ.P.Gibson,1997Effectoftotalallelicrelationshipon
accuracyofevaluationandresponsetoselection.J.Anim.Sci.75:1738–1745.
Piepho,H.P.,2009Ridgeregressionandextensionsforgenomewideselectioninmaize.
Crop.Sci.49:1165–1176.
30
Plaisted,R.L.,L.Sanford,W.T.Federer,A.E.Kehr,andL.C.Peterson,1962Specificand
generalcombiningabilityforyieldinpotatoes.Am.PotatoJ.39:185–197.
RDevelopmentCoreTeam,2015R:alanguageandenvironmentforstatisticalcomputing.R
FoundationforStatisticalComputing,Vienna.
Rak,K.,andJ.P.Palta,2015Influenceofmatingstructureonagronomicperformance,chip
frycolor,andgeneticdistanceamongbiparentaltetraploidfamilies.Am.J.PotatoRes.
92:518–535.
SchmitzCarleyC.A.,J.J.Coombs,D.S.Douches,P.C.Bethke,J.P.Paltaetal.,2017
Automatedtetraploidgenotypecallingbyhierarchicalclustering.Theor.Appl.Genet.
130:717–726.
Slater,A.T.,N.O.I.Cogan,J.W.Forster,B.J.Hayes,andH.D.Daetwyler,2016Improving
geneticgainwithgenomicselectioninautotetraploidpotato.PlantGenome9:3.
Slater,A.T.,G.M.Wilson,N.O.I.Cogan,J.W.Forster,andB.J.Hayes,2014Improvingthe
analysisoflowheritabilitycomplextraitsforenhancedgeneticgaininpotato.Theor.
Appl.Genet.127:809–820.
Smith,A.B.,B.R.Cullis,andA.Gilmour,2001Theanalysisofcropvarietyevaluationdata
inAustralia.Aust.N.Z.J.Stat.43:129–145.
SuG,O.F.Christensen,T.Ostersen,M.Henryon,andM.S.Lund,2012Estimatingadditive
andnon-additivegeneticvariancesandpredictinggeneticmeritsusinggenome-wide
densesinglenucleotidepolymorphismmarkers.PLoSONE7:e45293.
31
Sverrisdóttir,E.,S.Byrne,H.E.R.Sundmark,H.Ø.Johnsen,H.G.Kirketal.,2017Genomic
predictionofstarchcontentandchippingqualityintetraploidpotatousing
genotyping-by-sequencing.Theor.Appl.Genet.130:2091–2108.
Tai,G.C.C.,1976Estimationofgeneralandspecificcombiningabilitiesinpotato.Can.J.
Genet.Cytol.18:463–470.
Uitdewilligen,J.G.A.M.L.,A.M.A.Wolters,B.B.D’hoop,T.J.A.Borm,R.G.F.Visseretal.,
2013Anext-generationsequencingmethodforgenotyping-by-sequencingofhighly
heterozygousautotetraploidpotato.PLoSONE8:e62355.
vanBerloo,R.,R.C.B.Hutten,H.J.vanEck,andR.G.F.Visser,2007Anonlinepotato
pedigreedatabaseresource.Potato.Res.50:45–57.
VanRaden,P.M.,2008Efficientmethodstocomputegenomicpredictions.J.DairySci.91:
4414-4423.
Vitezica,Z.G.,L.Varona,andA.Legarra,2013Ontheadditiveanddominantvarianceand
covarianceofindividualswithinthegenomicselectionscope.Genetics195:1223-1230.
Voorrips,R.E.,G.Gort,andB.Vosman,2011Genotypecallingintetraploidspeciesfrombi-
allelicmarkerdatausingmixturemodels.BMCBioinformatics12:172.
Vos,P.G.,J.G.A.M.L.Uitdewilligen,R.E.Voorrips,R.G.F.Visser,andH.J.vanEck,2015
Developmentandanalysisofa20KSNParrayforpotato(Solanumtuberosum):an
insightintothebreedinghistory.Theor.Appl.Genet.128:2387-2401.
32
Wang,Y.,L.B.Snodgrass,P.C.Bethke,A.J.Bussan,D.G.Holmetal.,2017Reliabilityof
measurementandgenotypexenvironmentinteractionforpotatospecificgravity.Crop
Sci.57:1–7.
Wientjes,Y.C.J.,P.Bijma,J.Vandenplas,andM.P.L.Calus,2017Multi-populationgenomic
relationshipsforestimatingcurrentgeneticvarianceswithinandgeneticcorrelations
betweenpopulations.Genetics207:503–515.
Wricke,G.,andW.E.Weber,1986Quantitativegeneticsandselectioninplantbreeding.
WalterdeGruyter&Co.,Berlin.
Xu,S.,2013Mappingquantitativetraitlocibycontrollingpolygenicbackgroundeffects.
Genetics195:1209–1222.
Yang,J.,B.Benyamin,B.P.McEvoy,S.Gordon,A.K.Hendersetal.,2010CommonSNPs
explainalargeproportionoftheheritabilityforhumanheight.NatureGenet.42:565–
569.
33
FIGURECAPTIONS
Figure 1 Comparison of off-diagonal elements for the additive covariance matrices estimated
from markers (G) vs. pedigree (A), for the training population of 571 clones. The dashed line is
the linear regression using all pairs of clones (G = 0.66A – 0.06, R2 = 0.41), and the solid line is
the regression (G = 0.79A –0.09, R2 = 0.51) when distant relationships (A < 0.05) are excluded.
Figure 2 Comparison of off-diagonal elements for the digenic dominance (D) vs. additive (G)
covariance matrices estimated from markers, for the training population of 571 clones.
Figure 3 Comparing goodness-of-fit in the training population for different genetic covariance
structures (see Table 1), using the Akaike Information Criterion (AIC) relative to a baseline
model with independent clone effects.
Figure 4 Partitioning of genetic variance in the training population of 571 clones, for five
different covariance structures (see Table 1). Variance symbols are Va = additive, Vaa = additive x
additive epistasis, Vd = digenic dominance, and Vr = residual genetic variance (for independent
clone effects).
Figure 5 Prediction accuracy (left) and bias (right) when using all clones with pedigree depth <
12 to predict clones with pedigree depth ³ 12.
34
TABLESTable 1 Covariance structures for the total genotypic value (g) in Eq. 20.
Model var[g]
A I Vr + A Va
G I Vr + G Va
G+GG I Vr + G Va + (y#y) Vaa
G+D I Vr + G Va + D Vd
G+GG+D I Vr + G Va + (y#y) Vaa + D Vd
Table 2 Parentage, population size, and number of polymorphic markers in unselected F1 populations.
Population Mother (A90) Father (A90) < G90 > Pop Size No. Markers
W12011 W6360-1rus (0.01) Silverton Russet (0.01) 0.059 58 1580 W12012 W8736-6rus (0.06) Silverton Russet (0.01) 0.062 55 1744 W12060 Russet Norkotah (0.01) Canela Russet (0.01) 0.062 65 1376 W9817 Liberator (0.16) W4013-1 (0.12) 0.070 76 2311 W10010 Tundra (0.31) Bannock Russet (0.01) 0.066 48 1629 W´L Wauseon (0.20) Lenape (0.25) 0.070 167 1999 A90 = 90th percentile of pedigree relationship with the training population. G90 = 90th percentile of G coefficient between an F1 individual and the TP. <G90> is the average G90 for the F1 population. Table 3 Variance parameter estimates for the total genotypic value (Vg) and genotype ´ year effect (Vgy).
Variance Component
Yield (Mg2 ha-2) Specific Gravity Fry Color (L* 2)
Vg 88 26.5 ´ 10-6 5.1 Vgy 33 4.7 ´ 10-6 1.4
35
Table 4 Prediction accuracy in unselected F1 populations.
Population Trait (Reliability a) G model accuracy
G+GG+D accuracy
Gdip model accuracy
W12011 SpGr (0.74) 0.63 0.61 0.36 W12012 Yield (0.66) 0.31 0.31 0.06 W12012 SpGr (0.70) 0.27 0.32 0.16 W12060 SpGr (0.84) 0.25 0.29 0.13 W9817 Yield (0.82) 0.06 0.06 0.08 W10010 Yield (0.82) 0.12 0.11 0.14 W´L Yield (0.65) 0.34 0.34 0.16 W´L SpGr (0.85) 0.33 0.33 0.19 a mean reliability of the validation data Gdip = G matrix based on diploidized marker data SpGr = Specific Gravity
Figure 1 Comparison of off-diagonal elements for the additive covariance matrices estimated from markers (G) vs. pedigree (A), for the training population of 571 clones. The dashed line is the linear regression using all pairs of clones (G = 0.66A – 0.06, R2 = 0.41), and the solid line is the regression (G = 0.79A –0.09, R2 = 0.51) when distant relationships (A < 0.05) are excluded.
A
0.00 0.25 0.50 0.75 1.00
−0.50
−0.25
0.00
0.25
0.50
0.75
1.00
G
Figure 2 Comparison of off-diagonal elements for the digenic dominance (D) vs. additive (G) covariance matrices estimated from markers, for the training population of 571 clones.
G
−0.50 −0.25 0.00 0.25 0.50 0.75 1.00
−0.25
0.00
0.25
0.50
0.75
1.00
D
Figure 3 Comparing goodness-of-fit in the training population for different genetic covariance structures (see Table 1), using the Akaike Information Criterion (AIC) relative to a baseline model with independent clone effects.
AIC
Yield Specific Gravity Fry Color
−50
−40
−30
−20
−10
0
AGG+GGG+DG+GG+D
Figure 4 Partitioning of genetic variance in the training population of 571 clones, for five different covariance structures (see Table 1). Variance symbols are Va = additive, Vaa = additive x additive epistasis, Vd = digenic dominance, and Vr = residual genetic variance (independent clone effects).
Yield Specific Gravity Fry Color
A G
G+G
G
G+D
G+G
G+D
A G
G+G
G
G+D
G+G
G+D
A G
G+G
G
G+D
G+G
G+D
0.00
0.25
0.50
0.75
1.00
Prop
ortio
n of
Gen
etic
Var
ianc
e
VarianceVrVdVaaVa
Figure 5 Prediction accuracy (left) and bias (right) when using all clones with pedigree depth < 12 to predict clones with pedigree depth ³ 12.
Accu
racy
Yield SpGr Fry Color
0.2
0.3
0.4
0.5
0.6
AGG+GGG+DG+GG+D
Reg
ress
ion
Coe
ffici
ent
Yield SpGr Fry Color
0.5
0.6
0.7
0.8
0.9
1.0
1.1
1
FileS1:TablesS1–S2,FiguresS1–S3,andSupportingMethodsGeneticvariancepartitioningandgenome-widepredictionwithalleledosageinformationinautotetraploidpotatoEndelmanetal. Table S1 Number of new vs. returning clones by year in the training population.
Year No. new clones No. returning clones 2012 78 0 2013 59 45 2014 82 58 2015 43 56 2016 128 52 2017 181 67 Table S2 Genetic variance estimates (and SE) for the G+GG+D model, as a proportion of total genetic variance.
Yield Specific Gravity Fry Color
Va 0.45 (0.13) 0.20 (0.10) 0.45 (0.17)
Vaa 0.03 (0.24) 0.51 (0.22) 0.44 (0.23)
Vd 0.07 (0.10) 0.00 (NAa) 0.00 (NA)
Vr 0.45 (0.20) 0.29 (0.18) 0.12 (0.17) a REML solution on the boundary
2
Figure S1 Distribution of pedigree depth for the training population of 571 clones. Pedigree depth is the maximum number of generations to a founder.
1 2 3 4 5 6 7 8 9 10 11 12 13
Pedigree Depth
Num
ber o
f Clo
nes
020
4060
8010
012
014
0
3
Figure S2 Comparison of the observed (red circle) vs. expected (black line) frequency of heterozygotes under random mating, for the 3895 SNPs used in this study. For allele frequency p (q = 1–p), the heterozygote frequency at panmictic equilibrium is 4p3q + 6p2q2 + 4pq3 for tetraploids compared with 2pq for diploids (Gallais 2003).
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●
●
● ●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●●
●
●
●
●
●
●
●
●●● ●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●● ●●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
● ●●
●
●
●
●
●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●●●
●
●●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
● ●●
●
●
●
●
●●
●
●
●
●
●
● ●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●●●
● ●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●
●
●
●● ●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●● ●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●●●
●
● ●●
●
●
●●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
● ●●
●●
●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●●
●
●
●●
● ●
●●
●
●
●
●
●
●●●●
●
●
●
●
●●
●
●
●
●
●
●●
●●
●
●
●
●
●●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●●
●
●
●
●●
●
●
●
●●
●
●●●
●
●
●
●●
●●
●●
●
●
●
●
●
● ● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●●
●
●
● ●
●
●
●
●●●
●
●
●
●
●
●
●●
●
●
●
●
●
●● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
● ●●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●●●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●●
●
●
●
●●●
●●
●
●
●
●
● ●
●
●●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●●
●
●
●
●●●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●●
●●
●
●●
●
●
●
●●
●●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
● ●● ●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
● ●
●
●
●
●●
●
●● ●
●
●
●
●
●
●
●●
●
● ●
●
●
● ●
●
●
●
●
●
●
● ●
●●
●
●●
●
●
●
●
●
●
●●
●
●●
●
●
●
●●●
●
●●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
● ●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●●
●
●
● ●
● ● ●
●
●●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
● ●
● ●
●●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●● ●
● ●●●
●● ●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●●
●
●
●
●●
●
●
●●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●●
●
●
●
●
●
●
●●
●
●
●
●
●
● ●●
●
●
●●
●
●●
●●
●
●
●
● ●
●●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●●
●
●
●
●
●
●●
●
●
●
●
●
● ●●●
●
●●
●● ●
●●
●
●
●
●●
●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●
●
●●
●●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
● ●
●
●
●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●●
●
●
● ●
●
●
●
●
●
●
●
●
● ●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●● ●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
●●
●
●
● ●
●
●
●
●
●
●● ●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
● ●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
● ●
●
●
●
●●
●
●●
●●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●●●
●
●
●
● ●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●●
●
●●
●●●
●
●
●
●
●
●
●
●
●●
●
●
●
●
● ●
●
●●
●
●
●
●
●
●
●
●●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●
●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●●
●●
● ●●
●
●●
●
●
●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●●
●
●●●●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●●
●●
0.0 0.2 0.4 0.6 0.8 1.0
0.0
0.2
0.4
0.6
0.8
1.0
Allele Frequency
Freq
uenc
y of
Het
eroz
ygou
s G
enot
ypes
TetraploidDiploid
4
Figure S3 Effect of population size (N) on prediction accuracy within the training set. The points at N = 403 correspond to using all clones with pedigree depth < 12 to predict clones with pedigree depth ³ 12, using the G+GG+D model. The points at N = 100, 200, 300 are the mean accuracy for 200 random subsets, and the error bars show ± 1 standard error.
0.2
0.3
0.4
0.5
0.6
Training Population Size
Accu
racy
100 200 300 400
●
●
●
●
●
YieldSpecific GravityFry Color
5
SupportingMethodsNormalEquationsfortheAdditiveEffects
TakingthederivativeofEq.2withrespecttoanarbitraryallelet,andsettingtheresult
equaltozero:
0 = −2%&'&(&)&*+,'()* − -.' + .( + .) + .*01()*
− 2%&2&'&)&*[,2')* − (.2 + .' + .) + .*)]2)*
− 2%&2&(&'&*+,2('* − -.2 + .( + .' + .*012(*
− 2%&2&(&)&'+,2()' − -.2 + .( + .) + .'012()
(S1)
Becausegenotypicvalueisunchangeduponpermutationoftheindices,eachofthetermsin
Eq.S1isidentical,whichleadstotheresult:
0 =%&(&)&*+,'()* − -.' + .( + .) + .*01()*
(S2)
RearrangingEq.S2,andusingtheidentity∑ &( = 1( ,leadsto
%&(&)&*,'()* = .' + 3%&(.((()*
(S3)
MultiplyingEq.S3bypt,summingovert,andusingtheinterchangeabilityoftheindices,
generates
%&'&(&)&*,'()* = 4%&(.(('()*
(S4)
Fromthedefinitionofz(Eq.1),theleftsideofEq.S4iszero,sotherightsideisalsozero,
whichcompletestheproofofEq.3.
6
NormalEquationsfortheDigenicDominanceEffects
TakingthederivativeofEq.5withrespecttoanarbitraryallelepair(s,t),settingtheresult
equaltozero,andusingtheinterchangeabilityoftheindices,generates
0 =%&)&*[;<')* − (=<' + =<) + =<* + =') + ='* + =)*)])*
(S5)
Uponrearrangingandusingtheinterchangeabilityofindices,Eq.S5becomes
%&)&*;<')*)*
= =<' + 2%=<)&))
+ 2%=')&))
+%=)*&)&*)*
(S6)
Theleftsideofthisequationequals>̅<'⋅⋅ − A − .< − .' .Multiplyingbyptandsummingover
t(atfixeds),theleftsidebecomeszero,andtherightsidebecomes
0 = 3%=<)&))
+ 3%=)*&)&*)*
(S7)
MultiplyingEq.S7bypsandsummingoversleadsto0 = 6%=<)&<&)
)
(S8)
Eq.S8showsthatthesecondterminEq.S7iszero,whichimpliesthefirsttermisalsozero,
whichcompletestheproofofEq.6.
7
DerivationofEq.12
Forabi-alleliclocus,theconstraintEq.6isasystemoftwoequationswiththreedigenic
dominance parameters:
&=CC + D=CE = 0
&=CE + D=EE = 0(S9)
Touniquelydeterminethesolution,weintroducetheparameter= ≡ =CC − 2=CE + =EE,
whichincombinationwithEq.S9producesthelinearsystem:
H1 −2 1& D 00 & D
I H=CC=CE=EE
I = H=00I (S10)
UsingGaussianelimination,Eq.S10canbereducedto
H1 −2 10 D + 2& −&0 1 0
I H=CC=CE=EE
I = H=−&=−&D=
I (S11)
andback-substitutionproducesEq.12.
DerivationofEq.13
UsingEq.12,thetotaldigenicdominancevXforaclonewithdosageXoftheBalleleis
JK = 6=CC = 6DL=
JM = 3=CC + 3=CE = (3DL − 3&D)=
JL = 1=CC + 4=CE + 1=EE = (DL − 4&D + &L)=
JN = 3=CE + 3=EE = (−3&D + 3&L)=
JO = 6=EE = 6&L=
(S12)
Replacingallinstancesofqwith1–p,Eq.S12becomesJK = (6&L − 12& + 6)=
JM = (6&L − 9& + 3)=
JL = (6&L − 6& + 1)=
JN = (6&L − 3&)=
JO = 6&L=
(S13)
whichisequivalenttoEq.13.