Large-scale GWAS in sorghum reveals common genetic control of … · 50 Introduction Cereal...

37
Large-scale GWAS in sorghum reveals common genetic control of grain 1 size among cereals 2 Yongfu Tao 1* , Xianrong Zhao 1 , Xuemin Wang 1 , Adrian Hathorn 1 , Colleen Hunt 1,2 , Alan 3 W.Cruickshank 1,2 Erik J. van Oosterom 3 , Ian D. Godwin 3 , Emma S. Mace 1,2* , David R. 4 Jordan 1* 5 6 1 Queensland Alliance for Agriculture and Food Innovation (QAAFI), The University of 7 Queensland, Hermitage Research Facility, Warwick, QLD 4370, Australia 8 2 Agri-Science Queensland, Department of Agriculture and Fisheries (DAF), Hermitage 9 Research Facility, Warwick, QLD 4370, Australia. 10 3 Queensland Alliance for Agriculture and Food Innovation (QAAFI), The University of 11 Queensland, Brisbane, QLD 4072, Australia. 12 4 School of Agriculture and Food Sciences, University of Queensland, Brisbane, QLD, 13 Australia 14 15 * Corresponding authors: 16 Yongfu Tao: [email protected] 17 Emma Mace: [email protected] 18 David Jordan: [email protected] 19 ORCID ID: 20 Yongfu Tao: 0000-0001-9096-7407 21 Xuemin Wang: 0000-0002-2038-8829 22 Emma S. Mace: 0000-0002-5337-8168 23 David R. Jordan: 0000-0002-8128-1304 24 Ian D. Godwin: 0000-0002-4006-4426 25 Erik J van Oosterom: 0000-0003-4886-4038 26 . CC-BY-NC-ND 4.0 International license not certified by peer review) is the author/funder. It is made available under a The copyright holder for this preprint (which was this version posted July 22, 2019. . https://doi.org/10.1101/710459 doi: bioRxiv preprint

Transcript of Large-scale GWAS in sorghum reveals common genetic control of … · 50 Introduction Cereal...

  • Large-scale GWAS in sorghum reveals common genetic control of grain 1

    size among cereals 2

    Yongfu Tao1*, Xianrong Zhao1, Xuemin Wang1, Adrian Hathorn1, Colleen Hunt1,2, Alan 3

    W.Cruickshank1,2 Erik J. van Oosterom3, Ian D. Godwin3, Emma S. Mace1,2* , David R. 4

    Jordan1* 5

    6

    1 Queensland Alliance for Agriculture and Food Innovation (QAAFI), The University of 7

    Queensland, Hermitage Research Facility, Warwick, QLD 4370, Australia 8

    2 Agri-Science Queensland, Department of Agriculture and Fisheries (DAF), Hermitage 9

    Research Facility, Warwick, QLD 4370, Australia. 10

    3Queensland Alliance for Agriculture and Food Innovation (QAAFI), The University of 11

    Queensland, Brisbane, QLD 4072, Australia. 12

    4School of Agriculture and Food Sciences, University of Queensland, Brisbane, QLD, 13

    Australia 14

    15

    *Corresponding authors: 16

    Yongfu Tao: [email protected] 17

    Emma Mace: [email protected] 18

    David Jordan: [email protected] 19

    ORCID ID: 20

    Yongfu Tao: 0000-0001-9096-7407 21

    Xuemin Wang: 0000-0002-2038-8829 22

    Emma S. Mace: 0000-0002-5337-8168 23

    David R. Jordan: 0000-0002-8128-1304 24

    Ian D. Godwin: 0000-0002-4006-4426 25

    Erik J van Oosterom: 0000-0003-4886-4038 26

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    mailto:[email protected]:[email protected]://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Summary: 27

    Grain size is a key yield component of cereal crops and a major quality attribute. It is 28

    determined by a genotype’s genetic potential and its capacity to fill the grains. 29

    30

    This study aims to dissect the genetic architecture of grain size in sorghum via an 31

    integrated genome wide association study (GWAS) using a diversity panel of 837 32

    individuals and a BC-NAM population of 1,421 individuals. 33

    34

    In order to isolate genetic effects associated with grain size, rather than the genotype’s 35

    capacity to fill grain, a field treatment of removing half of the panicle during 36

    flowering was imposed. Extensive variation in grain size with high heritability was 37

    observed in both populations across 5 field trials. Subsequent GWAS analyses 38

    uncovered 92 grain size QTL, which were significantly enriched for orthologues of 39

    known grain size genes in rice and maize. Significant overlap between the 92 QTL 40

    and grain size QTL in rice and maize was also found, supporting common genetic 41

    control of this trait among cereals. Further analysis found grain size genes with 42

    opposite effect on grain number were less likely to overlap with the grain size QTL 43

    from this study, indicating the treatment facilitated identification of genetic regions 44

    related to the genetic potential of grain size rather than the capacity to fill the grain. 45

    46

    These results enhance understanding of the genetic architecture of grain size in cereal, 47

    and pave the way for exploration of underlying molecular mechanisms in cereal crops 48

    and manipulation of this trait in breeding practices. 49

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Introduction 50

    Cereal crops, including maize, rice, wheat, barley, and sorghum, supply more than 75% of the 51

    calories consumed by humans (Sands et al., 2009). They are critical to address global food 52

    security, which is under threat of population expansion and climate change. Among cereals, 53

    sorghum is an important source of food, fibre, feed, and biofuel, and provides staple food for 54

    over 500 million people in the semi-arid tropics of Africa and Asia. Cereal crops share a 55

    close ancestry, and strong gene co-linearity among them supports comparative genomics 56

    approaches for exploring the genetic bases of agronomical traits at a cross-species level 57

    (Bolot et al., 2009). Grain size is a key yield component of grain yield in cereal crops, crops 58

    and is a major quality attribute affecting planting, harvesting, and processing activities (Tao 59

    et al., 2017). A better understanding of the genetic basis of grain size, including underlying 60

    molecular mechanisms, will provide new targets for improving yield and grain quality in 61

    cereal breeding. 62

    Grain size in cereals is a function of both the potential maximum grain size and the capacity 63

    of the plant to fill the grain. The weight of individual grains is determined by the rate and 64

    duration of grain filling. In sorghum, the grain filling rate is highly correlated with the ovary 65

    volume at anthesis, which in turn is associated with the size of the meristematic dome during 66

    early floret development (Yang et al., 2009). While the potential maximum grain size is 67

    largely genetically pre-determined, the capacity to fill the grain is strongly affected by 68

    assimilate availability (Gambín & Borrás, 2010). Grains are the key sink for carbon demand 69

    post-anthesis, and the amount of available assimilate per grain is determined by grain number 70

    and assimilate supply, which are associated with a range of genetic and environmental factors 71

    (Gambín & Borrás, 2010; Wardlaw, 1990). Negative correlations between grain number and 72

    grain size are commonly observed in cereal crops (Acreche & Slafer, 2006; Peltonen-Sainio 73

    et al., 2007; Sadras, 2007; Tao et al., 2018). In sorghum, grain number reduction at early 74

    stages of flowering has been observed to result in a larger grain weight (Sharma et al., 2002). 75

    This indicates that demand for carbohydrate to fill grains exceeds the available supply. 76

    In cereals, the genetic architecture of grain size is complex and involves numerous genes. In 77

    rice and maize, around 110 genes affecting grain size have been cloned (Tao et al., 2017). 78

    Given the conservation of gene function among closely related cereal species, this presents 79

    opportunities for comparative genomics studies to investigate the genetics of grain size in 80

    other cereal crops using knowledge gained from gene cloning studies in maize and rice. The 81

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • sorghum QTL atlas of Mace et al. (2018) details around 100 grain weight QTL in sorghum 82

    that have been identified through linkage analysis in 18 studies (Paterson et al., 1995; 83

    Tuinstra et al., 1997; Rami et al., 1998; Brown et al., 2006; Feltus et al., 2006; Murray et al., 84

    2008; Srinivas et al., 2009; Phuong et al., 2013; Rajkumar et al., 2013; Reddy et al., 2013; 85

    Han et al., 2015; Mocoeur et al., 2015; Shehzad & Okuno, 2015; Gelli et al., 2016; Spagnolli 86

    et al., 2016; Bai et al., 2017; Boyles et al., 2017;Tao et al., 2018). A further 12 markers 87

    associated with grain weight have been identified in three GWAS studies (Upadhyaya et al., 88

    2012; Zhang et al., 2015; Boyles et al., 2016). Given the number of genomic regions 89

    identified using standard genetic linkage mapping approaches, the limited number of marker-90

    trait associations identified in the GWAS studies is likely due to the small population sizes 91

    employed in these studies (n = 242 - 390). All of the previous genetic studies on grain size in 92

    sorghum have focused on the dissection of the genetic basis of grain size under natural 93

    conditions, which also include environmental impacts and Genotype × Environment (G×E) 94

    interactions. However, minimisation of variation in grain-filling capacity could limit the 95

    environment effects on variation in grain size, and thereby facilitate identification of genetic 96

    factors underlying variation of this trait. 97

    This study aims to 1) investigate the genetic architecture of grain size in sorghum by 98

    conducting an integrated GWAS in two large populations, a diversity panel of 837 99

    individuals and a backcross nested association mapping population (BC-NAM) population of 100

    1421 individuals, and 2) compare genetic loci affecting grain size identified in this study to 101

    those reported in rice and maize. To minimize variation in grain size caused by assimilate 102

    variability, a field treatment of removing half of the panicle during flowering time was 103

    imposed in this study in order to maximise assimilate availability for the remaining seeds 104

    during grain filling. 105

    106

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Materials and methods 107

    Plant material: 108

    Two populations, together comprising over 2000 individuals, were used in this study: a 109

    diversity panel (DP, n=837) (Table S1) and a BC-NAM consisting of 30 interrelated families 110

    (n=1421) (Figure S1A; Table S2). The diversity panel has around 225 genotypes in common 111

    with the US sorghum association panel. It consists predominantly of lines developed by the 112

    sorghum conversion program conducted by Texas Agricultural Experiment Station, which 113

    took diverse sorghum lines from the world collection and converted tall, late, or photoperiod 114

    sensitive sorghums from the tropics into short, early, photoperiod insensitive types that could 115

    be used by breeders in temperate regions (Rosenow et al., 1997). The program involved 116

    repeated backcrossing to the exotic line combined with selection for height and maturity in 117

    temperate environments. The resulting material has been reported to contain >4% genome 118

    introgression from the temperate donor with the remainder from the exotic parent and has a 119

    much narrower range of height and maturity than the original exotic lines (Thurber et al., 120

    2013). 121

    The BC-NAM population was previously developed by the Department of Agriculture and 122

    Fisheries (DAF), Queensland, Australia and described in (Jordan et al., 2011). In summary, 123

    each BC-NAM family was produced by crossing a single elite parent (R931945-2-2 or 124

    R986087-2-4-1) with a diverse, exotic line (the non-recurrent parent) and backcrossing the 125

    resulting F1 to the elite parent to produce a large BC1F1 population. Variable numbers of 126

    plants were selected from each BC1F1 family and using single-seed descent >5 generations of 127

    self-pollination were generated to produce BC1F6 and beyond. During generation advance, 128

    selection was imposed for height and maturity of the recurrent parental type. 129

    The 30 BC-NAM families used in the current study are listed in Table S2. In total 27 unique 130

    non-recurrent parental lines and 2 elite recurrent parental lines were used for population 131

    development. These included 10 diverse lines from advanced breeding programs and 17 132

    landraces, 7 of which were converted to temperate adaptation through the Sorghum 133

    Conversion Program (Stephens et al., 1967). Fourteen of the 27 non-recurrent parental lines 134

    were also included in the diversity panel (Figure S1B). 135

    Field Trials and phenotypic evaluation 136

    A total of 5 field trials were planted at Hermitage Research Facility (HER), Warwick, 137

    Queensland, Australia (28°12ʹS, 152°5ʹE, 470 m above sea level) and Gatton Research 138

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Facility (GAT), Gatton, Queensland, Australia (27°33ʹS, 152°20ʹE, 94 m above sea level) 139

    during the Australian summer cropping seasons of 2014/15 and 2015/16. Two trials of the 140

    diversity panel were grown in the 2015/16 season at Hermitage (DPHER16) and Gatton 141

    (DPGAT16). The BC-NAM were planted in three trials at Hermitage in 2014/15 142

    (NAMHER15), and Gatton in 2014/15 (NAMGAT15) and 2015/16 (NAMGAT16). All trials 143

    were planted between November and February, using a row column design with partial 144

    replication. Each plot consisted of two 6 m rows with a row spacing of 0.75 m. Differences in 145

    plant height within a plot was minimal. Different numbers of genotypes were grown in each 146

    trial due to seed availability (Table S3). Standard agronomic practices and pest-control 147

    practices were applied. 148

    A treatment of removing half of each of two panicles in each plot was imposed when each 149

    panicle commenced flowering, before significant grain development had occurred (Figure 150

    1A). Once physiological maturity was reached, the remaining half panicle (hereafter, referred 151

    as half head) was hand harvested and threshed using a mechanical threshing machine. In 152

    DPGAT16, a full head panicle in each plot was also harvested and threshed for comparison 153

    with the half head panicles from the same plot. An aspirator was used to remove any debris 154

    from each sample before measurements were taken. Although sorghum grain is typically 155

    tending toward spherical, considerable phenotypic variation in length, width, and thinkness 156

    does exist. Therefore, grain size parameters, including thousand kernel weight (TKW), and 157

    the length, width, thickness, and volume of grains were measured using Seedcount SC5000 158

    (Next Instruments, Condell Park, NSW, Australia) and a digital balance. 159

    Statistical analysis 160

    Due to the large number of genotypes included in this study, partial replication was used in 161

    all trial designs (Cullis et al., 2006), with 30 % of the genotypes replicated two or more times 162

    and the remaining 70 % represented by single plots. The total number of plots in each trial 163

    ranged from 880 to 1521, with the total number of genotypes planted in each trial ranging 164

    from 658 to 1164 (Table S3). A customized design was used to minimise spatial error effects 165

    within each trial. The concurrence of genotypes and populations across the two seasons 166

    allowed the DP and BC-NAM trials to be analysed as two multi-environment trials (METs) 167

    for each of the five measured grain size parameters, comprising of two trials for the DP and 168

    three for the BC-NAM. Each MET was analysed by fitting a linear mixed model using the 169

    package ASReml (Butler et al., 2009) and the R statistical software. Each model consisted of 170

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • a fixed effect for the targeted trait at each trial, random effects for genotype within trial, and 171

    spatial error for each individual trial (Smith et al., 2001). The G×E interaction was analysed 172

    by fitting a third-order factor analytic structure to the trial × genotype interaction, in this case 173

    the structure also modelled the 3-way interaction between site, trait, and genotype. The 174

    analysis resulted in genetic variances for each trial and loading values representing factor 175

    analytic loadings. 176

    Generalised repeatability estimates were calculated for grain size parameters using the 177

    method proposed by Cullis et al. (2006). High correlations were observed among grain size 178

    parameters, and hence a principal components analysis was conducted to further investigate 179

    the interrelationships among TKW and grain length, width, and thickness (Figure S3 and S4). 180

    Because it is possible that a constructed trait such as a principal component may explain the 181

    data more effectively, the main principal components were used as additional traits in the 182

    association analysis. 183

    Genotyping and imputation 184

    The 2 populations were genotyped using medium to high density genome-wide SNPs 185

    provided by Diversity Arrays Technology Pty Ltd (www.diversityarrays.com). DNA was 186

    extracted from bulked young leaves of five plants from a plot of each genotype in the 2 187

    populations using a previously described CTAB method (Doyle, 1987). The DNA samples 188

    were then genotyped following an integrated DArT and genotyping-by-sequencing (GBS) 189

    methodology, which involves complexity reduction of the genomic DNA to remove repetitive 190

    sequences using methylation sensitive restriction enzymes prior to sequencing on Next 191

    Generation sequencing platforms. The sequence data generated were then aligned to version 192

    v3.1.1 of the sorghum reference genome sequence (McCormick et al., 2018) to identify SNPs 193

    (Single Nucleotide Polymorphisms). 194

    In total, 111,089 SNPs were identified in the diversity panel and 31,478 SNPs in the BC-195

    NAM population. The overall proportion of missing data reported in the raw genotypic data 196

    sets was approximately 10% and 5% for the diversity panel and BC-NAM respectively. 197

    Individual SNP markers with >50% missing data were removed from further analysis and the 198

    remaining missing values were phased and imputed using Beagle v4.1 (Browning & 199

    Browning, 2016). An average imputation accuracy of 96% was achieved across both 200

    populations. 201

    Population statistics, GWAS analysis and QTL identification 202

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    http://www.diversityarrays.com/https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Pairwise linkage disequilibrium (LD) (r2) was calculated using PopLDdecay (Zhang et al., 203

    2018) for the diversity (Figure S4A) and the BC-NAM (Figure S1C). The population 204

    structure in the diversity panel was analysed used LEA in R (François, 2016). The structure 205

    analysis identified five groups that corresponded to 4 racial groups (kafir, caudatum, guinea, 206

    durras of Asian origin, durras of East African origin) (Figure S4B) based on previous racial 207

    information and geographical origin for a subset of lines. Each individual line with ≥95 % of 208

    genetic identity from one of the five identified groups was designated as a representative of 209

    the corresponding racial group (Figure S4C, Table S4). The remaining lines were considered 210

    as admixtures of multiple groups. 211

    For the GWAS analysis, imputed SNP data sets for both populations were filtered for minor 212

    allele frequency (MAF) >0.005. The final number of SNPs used for GWAS was 56,412 for 213

    the diversity panel (Figure S5) and 26,926 for the BC-NAM population (Figure S6). For the 214

    diversity panel, a principal component analysis (PCA) was conducted using a pruned SNP 215

    data set to control for population structure. Marker pruning was conducted using PLINK 216

    (Purcell et al., 2007) with a sliding window of 100 SNPs , a step size of 50 SNPs and r2 of 217

    0.5. For the BC-NAM population, pedigree was used to control population structure. GWAS 218

    was performed using FarmCPU (Liu et al., 2016) for both populations for the five measured 219

    grain size parameters and derived PCs. Thresholds for defining significant and suggestive 220

    associations were set using the Bonferroni correction based on the number of independent 221

    tests, calculated using the GEC software package (Li et al., 2012). Marker-trait associations 222

    were then identified using a two-step process. Firstly, SNPs were identified that were 223

    significantly associated with grain size parameters and derived PCs in both populations 224

    separately (p

  • population structure accounted for by PCs for the diversity panel and by pedigrees in the BC-236

    NAM population. 237

    A candidate region based approach was used to further investigate associations between 238

    population specific QTL and grain size parameters in the alternative population. SNPs within 239

    a 0.5 cM window of a population specific QTL were extracted from the alternative 240

    population. To examine associations between SNPs in the candidate regions and the grain 241

    size parameters, GLM in TASSEL 5.0 was run, using either PCA in the diversity panel or 242

    pedigree in the BC-NAM population as a covariate to control for population structure 243

    (Bradbury et al., 2007). The Bonferroni correction (0.05/number of markers) was used to 244

    control for false positives. 245

    Using the BC-NAM population, allele effects of the exotic parent were estimated compared 246

    to the recurrent parent. At each QTL, the numbers of exotic alleles with larger and smaller 247

    values than the recurrent parent were determined, and a test conducted to assess if the 248

    numbers of positive and negative alleles relative to the recurrent parent varied. Families with 249

    20 cM were excluded during this step. 255

    Candidate gene analysis 256

    Candidate genes for grain size in sorghum were identified using the methods described by 257

    Tao et al. (2017). Firstly, genes functionally determined to control grain size in rice, maize, 258

    and sorghum were collated through a literature search (Table S6). Subsequently, 259

    corresponding sorghum orthologous of grain size genes in rice and maize were identified 260

    using a combination of syntenic and bidirectional best hit (BBH) approaches. Syntenic gene 261

    sets among rice, maize and sorghum were downloaded from PGDD 262

    (http://chibba.agtec.uga.edu/duplication/) to identify sytenic orthologues of known grain size 263

    genes from rice and maize. Local blast was performed to identify best BLAST hits of pairs of 264

    genes from two genomes, using BLASTP. 265

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    http://chibba.agtec.uga.edu/duplication/https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Results 266

    Minimizing environmental impact on grain size through the removal of half of the 267

    panicle during flowering time 268

    In the DPGAT16 trial, a significant increase in average grain size of 8.54% was observed in 269

    half heads compared to full heads (Figure 1B). The magnitude of this increase varied 270

    significantly among genotypes, indicating the presence of genotypic variation in the degree of 271

    source limitation for grain filling in the full head (Figure 1C and 1D). Nonetheless, TKW of 272

    half-head and full-head treatments were highly correlated (r2~0.70) (Figure S7). Since the 273

    greater average TKW of the half head treatment compared with the full head treatment 274

    indicated that this treatment was effective in removing some of the variation in grain size due 275

    to restrictions in grain-filling capacity, data collected from plants with half head phenotypes 276

    were used in this study. 277

    Phenotypic variation of grain size 278

    Substantial variation in TKW and the volume, length, width, and thickness of grains was 279

    observed in half head samples across the five trials and two populations (Tables 1 and 2, 280

    Figure 2). The diversity panel exhibited much broader variation in these grain size parameters 281

    than the BC-NAM population, reflecting its larger genetic diversity. For example, in the trials 282

    conducted at Gatton in 2016, the range in TKW was 6.5-58.4g in the diversity panel, 283

    compared to 16.9-44.5g in the BC-NAM population. All parameters had low levels of G×E 284

    interaction and were highly correlated across trials in both the diversity panel (r>0.82, 285

    average r=0.85) and the BC-NAM population (r>0.79, average r=0.89) (Tables 1 and 2). 286

    Hence, the combined cross-trial BLUPs were used for subsequent analyses of these 287

    parameters. The high cross-trial correlations of the individual grain size parameters were 288

    consistent with their medium to high heritability (Table 1 and 2). 289

    All five grain size parameters displayed near normal distributions in both populations (Figure 290

    2), which in combination with the medium to high heritability and low G×E observed, 291

    indicated that variation in the parameters was controlled by multiple genetic loci. High 292

    correlations among the five grain size parameters were also observed in both populations, 293

    implying a high degree of shared genetic control among them (Figure 2). The principal 294

    components analysis determined that 3 PCs accounted for the majority (>85%) of the 295

    variation in the grain size parameters observed across multiple trials in both populations 296

    (Figure S2 and S3). The direction of the correlations between traits was positive in all 297

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • contrasts with the exception of grain thickness, which displayed negative correlations with 298

    the other four grain size parameters in the BC-NAM population (r=-0.39), as opposed to 299

    positive correlations (r=+0.44) in the diversity panel (Figure 2). 300

    To investigate possible correlations between racial groups and grain size parameters in 301

    sorghum, structure analysis of the diversity panel was conducted, which revealed 5 groups 302

    that corresponded to different racial groups of sorghum (Figure S4B, Table S4). All grain size 303

    parameters showed significant differences among these racial groups (ANOVA, p-value 304

    80%) of these 54 QTL were 312

    significantly associated with multiple grain size parameters (Table S9). However, phenotypic 313

    variation explained by individual QTL was generally small and the majority of QTL 314

    explained < 5% of phenotypic variation each. In the BC-NAM population, 131 significant 315

    marker-trait associations were identified with 106 SNPs associated with between 1 - 4 grain 316

    size parameters (Figure 4, Figure S9, and Table S10). Because the extent of LD was greater 317

    in the BC-NAM than in the diversity panel (Figure S1C), a 2 cM window was used to cluster 318

    these associated SNPs into 66 QTL regions. Similar to observations for the diversity panel, 319

    the vast majority (~85%) of the QTL were significantly associated with multiple grain size 320

    parameters and phenotypic variation explained by individual QTL was generally small, with 321

    the majority of QTL explaining < 5% of the phenotypic variation of a given grain size 322

    parameter (Table S11). Within the BC-NAM, it was possible to observe the distribution of 323

    allelic effects of the exotic parents compared to the recurrent parent (Figure 5). In general, 324

    alleles were observed that were both larger and smaller than the elite parent, which was 325

    supportive of the presence of multiple alleles at the majority of QTL. 326

    A comparison of the QTL identified initially in both populations separately revealed 28 QTL 327

    in common, with 26 QTL being unique to the diversity panel and 38 QTL unique to the BC-328

    NAM population (Figure 6), resulting in 92 QTL identified in total across populations. These 329

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • 92 QTL were distributed across all 10 sorghum chromosomes (Table 3). Further examination 330

    revealed that the majority of the population specific QTL (22 out of 26 QTL unique to the 331

    diversity panel and 34 out of 38 QTL specific to BC-NAM population) were significantly 332

    associated with grain size parameters in the alternative population using the candidate region 333

    based association analysis (Table S12). 334

    The genetic basis of grain size has been the focus of 18 previous studies in sorghum, which 335

    used 21 bi-parental populations and reported 85 grain weight QTL (Table S5). Nearly three 336

    quarters (62) of these previously reported QTL co-located with QTL identified in this study. 337

    A significant overlap between the QTL identified in this study and grain size SNPs identified 338

    in rice (Huang et al., 2012) was observed, with 60% of the rice grain size SNPs identified 339

    within a 1cM window of the sorghum QTL identified in this study (p-value

  • reported to have an opposite effect on grain number and grain size were underrepresented in 362

    the overlap with the QTL from this study (Table S15). 363

    One of the candidate genes, SbGS3, the orthologue of the rice grain size gene, GS3 (Mao et 364

    al., 2010), co-located with qGS1.9, a QTL identified in both the diversity and BC-NAM 365

    populations, for grain length, weight, volume, width, and thickness. In the diversity panel, 366

    one SNP was identified that caused a C to A change in the fifth exon of SbGS3, turning a 367

    cysteine codon (TGC) to a premature stop codon (TGA), resulting in the 198 amino acid 368

    protein being truncated at the 140th amino acid. However, the most significant SNP identified 369

    for qGS1.9 was 11.48kb (0.01cM) away from SbGS3, rather than the large-effect SNP 370

    causing truncation at the 140th amino acid. This may have been due to the low MAF (22 out 371

    of 837 individuals carried the premature stop codon allele) and the similar genetic 372

    background for the individuals carrying the premature stop codon allele. Transcriptomics 373

    expression data (Makita et al., 2015) showed that GS3 had comparable expression patterns in 374

    sorghum and rice, with high expression levels observed in the panicle and minimal 375

    expression in other tissue types covering a range of temporal and spatial variation (Figure 376

    S10A). Additional transgenic experiments, using RNAi, have demonstrated that inhibiting the 377

    expression of SbGS3 increased grain weight by 9.4% on average in sorghum (Trusov et al., 378

    pers. comm.). Both positive and negative allelic effects from non-recurrent parental lines 379

    were observed in the BC-NAM compared with the recurrent parental line, R931945-2-2, 380

    indicating the existence of multiple alleles of SbGS3 in the population (Figure S10 B). 381

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Discussion 382

    Grain size is one of the most critical traits of cereal crops due to its direct contribution as a 383

    yield component, its importance as a quality attribute, and its contribution to fitness 384

    stemming from its impact on reproductive rates, emergence and establishment (Tao et al., 385

    2017; Westoby et al., 1992). This study represents one of the largest and most comprehensive 386

    investigations of the genetic basis underling this trait in any cereal via a joint GWAS analysis 387

    of grain size in sorghum using a diversity panel (n=837) and a BC-NAM population 388

    (n=1421). The population size is the largest of all GWAS studies on grain size in any cereal 389

    crop, >5 times larger than the second largest GWAS studies in sorghum. As a result, a total 390

    number of 92 grain size QTL were identified. Enrichment of grain size candidate genes 391

    within these QTL was observed, and significant overlap between these 92 QTL and SNPs 392

    associated with grain size in rice and maize was found, highlighting common genetic control 393

    of this trait among cereals. In addition, the treatment of removing half the panicle during 394

    flowering facilitated minimisation of variation of grain size due to assimilate variability, and 395

    identification of genetic regions related to the genetic potential of grain size rather than the 396

    capacity to fill the grain. Therefore, these results provide an enhanced understanding of the 397

    genetic basis underlying grain size in sorghum and offer opportunities for breeders in 398

    sorghum and other cereal species to manipulate grain size to improve crop productivity and 399

    quality. 400

    The relationship between grain size and grain number in sorghum is mainly driven by 401

    genes acting before grain filling 402

    A negative correlation between grain size and number is widely observed in cereals The 403

    nature of this correlation is complex (Sadras 2007) and is partially driven by the fact that 404

    potential maximum grain size and grain number are established before grain filling and by the 405

    availability of assimilates to enable seeds to reach their genetic potential (Yang et al., 2009; 406

    Gambín & Borrás, 2010). In this study, the half-head treatment resulted in an average 407

    increase in TKW of 8.54%, with significant variation among genotypes. The average change 408

    in seed weight is in line with previous estimates of sorghum’s capacity to compensate for loss 409

    of seeds (Sharma et al., 2002). Seed weight in the full-head treatment explained ~70% of the 410

    genetic variation in grain size in the half-head treatment, indicating that while assimilate 411

    availability is a significant factor influencing grain size, most of the genetic factors driving 412

    the correlation between grain size and number in sorghum are acting prior to grain filling. 413

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Future yield gains may require that this association is better understood to determine if the 414

    negative association can be broken at some or all QTL. 415

    Shared genetic control of grain size dimensions 416

    Sorghum grains vary in terms of length, thickness, width, volume and weight. Substantial 417

    genetic variation was observed for all of these parameters in both populations. Cross 418

    environment correlations of these grain size parameters within the same population were very 419

    high and the heritability of the traits was also moderate to high, indicating strong genetic 420

    control as previously observed (Prado et al., 2014; Tao et al., 2018). These findings, 421

    combined with a normal distribution of the phenotypes, indicate that all of these parameters 422

    are likely to be controlled by many genes with small to moderate effects. In addition, strong 423

    correlations were observed between the different grain size parameters in the same 424

    population, indicating they have a high degree of common genetic control. This was expected 425

    for parameters such as volume and weight and their correlations with the various measured 426

    dimensions, since the traits are linked mathematically. However, strong correlations between 427

    most of these parameters in both of the two populations indicate shared genetic control. In 428

    contrast to positive correlations between other grain size parameters in both populations, the 429

    grain thickness trait was negatively correlated with the other 4 parameters in the BC-NAM 430

    population but positively correlated with these parameters in the diversity panel. This is likely 431

    due to the different genetic backgrounds of the two populations. The diversity panel has a 432

    good representation of sorghum’s racial groups (Thurber et al., 2013). Although the BC-433

    NAM also provides a good representation of the genetic variation of the racial groups of 434

    sorghum, because of the population structure (backcross derived based on a single reference 435

    parent), the genetic composition of each line is strongly biased towards the genetic 436

    background of the elite recurrent parent, which is primarily of caudatum origin. 437

    High-confidence genetic architecture of grain size revealed through independent multi-438

    population GWAS 439

    Population size is a key factor affecting the power of a GWAS study, as is the need to control 440

    for type I and type II errors (Visscher et al., 2017). The two populations used in this study, 441

    taken individually, are among the largest published to date in cereal crop on grain size. This 442

    size, coupled with the capacity to independently verify putative QTL in the alternative 443

    population, has provided us with a substantial advantage over previously published studies. 444

    The diversity panel contains representatives of most of the racial groups in the sorghum gene 445

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • pool and is characterised by its fast LD decay. In contrast, the BC-NAM population consists 446

    of BC1 derived lines sharing ~80% of their genomes with the elite reference parent (Jordan et 447

    al., 2011), which provides an opportunity to assess exotic alleles in the genetic background of 448

    an elite inbred. The BC-NAM population has lower genetic diversity, a more balanced 449

    structure and greater LD and therefore potentially more power to detect QTL. The joint use of 450

    these complementary populations has also provided a more comprehensive understanding of 451

    the genetic architecture of grain size, and ready-to-use information on the effect of QTL 452

    alleles in elite breeding material. 453

    The high number (92) of QTL identified in this study and their relatively small individual 454

    effects highlight the genetic complexity of grain size. Over half of the QTL identified in this 455

    study (49) were co-located with QTL identified in previous sorghum studies (Table 3) with < 456

    30% of the previous QTL not detected in this study (Table S5). Care must be taken with 457

    interpreting this result however, since the small population sizes of the previous studies 458

    limited their power to detect QTL and resulted in large confidence intervals to the extent that 459

    they often encompassed a number of the QTL detected in this study. In addition, two of the 460

    previous studies identified QTL from crosses between wild and domestic sorghum (Paterson 461

    et al., 1995; Tao et al., 2018), which may not have been detected in our study as wild species 462

    were not included in either population, plus none of the other studies attempted to minimise 463

    variation in the capacity to fill grain. Hence, it is likely that some of the QTL previously 464

    detected may not be segregating in our study. 465

    Our study was internally consistent, with 83% of the QTL detected being associated with 466

    multiple traits, in line with the correlations observed between the phenotypes. In addition, 467

    91% of the QTL were supported by evidence from the two independent populations either via 468

    co-located QTL or with support from the candidate region analysis, resulting in only 8 469

    population-specific QTL being identified. Given that the two populations were independent 470

    and differed in both genetic composition, power and LD, the high correspondence between 471

    the two studies suggests that we identified the majority of the loci for grain size in cultivated 472

    sorghum. 473

    Significant allelic diversity exists in grain size QTL 474

    The BC-NAM population provided us with the opportunity to investigate variation in the 475

    effects of different QTL alleles in an elite genetic background. Our results suggest the 476

    presence of multiple alleles at the majority of the QTL with effects of the non-recurrent 477

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • parental alleles that were both greater and smaller that the recurrent parent allele observed at 478

    most loci (Figure 5). The distribution of the allele effects provides interesting insights into 479

    selection for the grain size trait in an elite breeding program. There was weak tendency for 480

    the elite parental alleles to contribute to larger grain size compared to the exotic parent. This 481

    is similar to an earlier study of flowering time in the same population where positive and 482

    negative allele effects relative to the recurrent parental allele were similarly distributed (Mace 483

    et al., 2013). The lack of bias observed in the current study suggests modern sorghum 484

    breeding with its focus on increasing grain yield has not selected for grain size. The lack of 485

    strong selection for grain size by modern breeders is interesting and suggests a potential 486

    physiological restriction to increasing grain yield by selection for larger grain. This is 487

    consistent with the observation that much of the progress in enhancing grain yield in modern 488

    breeding of cereals has been achieved by selection for increased grain number rather than 489

    grain size. Clearly, further research is required before breeders attempt to increase grain yield 490

    by introducing alleles for larger grain size. However, given the range of allele effects, it is 491

    likely some opportunities exist to simultaneously improve grain yield and grain size, eg 492

    through manipulation of the duration of grain filling (Yang et al., 2010). 493

    Correspondence of loci controlling grain size among cereals 494

    Sorghum shares close ancestry with maize and rice, with orthologous genes from these 495

    species commonly sharing the same function (Bolot et al., 2009; Paterson et al., 1995). A 496

    significant association was found between the locations of the QTL in this study and GWAS 497

    signals for grain size in rice (Huang et al., 2012) and maize (Liu et al., 2019), and 498

    orthologous of cloned grain size genes from rice and maize. Both findings strongly support a 499

    common genetic architecture underlying this trait across these cereals. This provides 500

    opportunities for breeders to exploit information from other cereals, as genes identified from 501

    large scale GWAS in one species can be targeted for allele mining or diversity creation via 502

    gene editing in another species. 503

    As an example, the gene SbGS3, the orthologue of the cloned gene Grain Size 3 in rice, was 504

    found to control variation of grain size in sorghum. In addition to the presence of a large-505

    effect SNP in the gene causing a premature stop codon, our GWAS identified a SNP that was 506

    11.48 kb away from this gene as significantly associated with variation of grain size. SbGS3 507

    showed a similar expression pattern as GS3 in rice with high expression in early 508

    inflorescence, indicating the same role in grain development. Further evidence from 509

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • transgenic experiments has shown that inhibiting the expression of SbGS3 increased grain 510

    weight by 9.4% on average in sorghum (Trusov et al., pers. comm.). It has also been reported 511

    that the orthologue of GS3 in maize, ZmGS3, was involved in kernel development (Li et al., 512

    2010). GS3 contains four domains: an OSR domain in the N terminus, a transmembrane 513

    domain, a TNFR/NGFR family cysteine-rich domain, and a VWFC in the C terminus (Mao et 514

    al., 2010). In rice, overexpression of a truncated cDNA sequence of GS3 where the VWFC 515

    domain is deleted, produced shorter grains (Mao et al., 2010). The premature stop change of 516

    SbGS3 caused a truncation of the VWFC domain. Further investigation is required to see 517

    whether comparable effects caused by the truncation occur in sorghum. 518

    519

    Conclusion: 520

    Grain size is a key quantitative trait of cereal crops. This study presents a comprehensive 521

    genetic dissection of grain size in sorghum using two large and complementary populations, 522

    resulting in the identification of a total of 92 QTL related to grain size. Significant overlap 523

    was found between the QTL identified in this study and grain size GWAS signals in rice and 524

    maize, and orthologues of cloned grain size genes from rice and maize, supporting a common 525

    genetic architecture underlying this trait across these cereals. These findings provide a better 526

    understanding of the genetic architecture of grain size in sorghum, and pave the way for 527

    exploration of its underlying molecular and physiological mechanisms in cereal crops and its 528

    manipulation in breeding practices. 529

    530

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Acknowledgments 531

    We acknowledge access to background IP from the Grains Research and Development 532

    Corporation and support from the University of Queensland and the Department of 533

    Agriculture and Fisheries Queensland. 534

    535

    Author contributions 536

    D.J., E.M., and I.G. conceived and designed the experiments: Y.T., X.Z., X.W. and A.C. 537

    collected data; Y.T., C.H., A.H., E.O., D.J., and E.M. analysed data; Y.T. wrote the 538

    manuscript. E.M., E.O., I.G., and D.J. revised the manuscript. All authors read and approved 539

    the final manuscript. 540

    541

    Funding 542

    This work was supported by the Australian Research Council (ARC) Discovery project 543

    DP14010250. 544

    545

    Conflict of Interest Statement 546

    The authors declare that the research was conducted in the absence of any commercial or 547

    financial relationships that could be construed as a potential conflict of interest. 548

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Reference 549

    550 Acreche MM, Slafer GA. 2006. Grain weight response to increases in number of grains in wheat in a 551

    Mediterranean area. Field Crops Research 98(1): 52-59. 552 Bai CM, Wang CY, Wang P, Zhu ZX, Cong L, Li D, Liu YF, Zheng WJ, Lu XC. 2017. QTL 553

    mapping of agronomically important traits in sorghum (Sorghum bicolor L.). Euphytica 554 213(12). 555

    Bolot S, Abrouk M, Masood-Quraishi U, Stein N, Messing J, Feuillet C, Salse J. 2009. The 'inner 556 circle' of the cereal genomes. Current opinion in plant biology 12(2): 119-125. 557

    Boyles RE, Cooper EA, Myers MT, Brenton Z, Rauh BL, Morris GP, Kresovich S. 2016. 558 Genome-Wide Association Studies of Grain Yield Components in Diverse Sorghum 559 Germplasm. The Plant Genome 9(2). 560

    Boyles RE, Pfeiffer BK, Cooper EA, Zielinski KJ, Myers MT, Rooney WL, Kresovich S. 2017. 561 Quantitative Trait Loci Mapping of Agronomic and Yield Traits in Two Grain Sorghum 562 Biparental Families. Crop Science 57(5): 2443-2456. 563

    Butler D, Cullis BR, Gilmour A, Gogel BJ. 2009. ASReml-R reference manual. Department of 564 Primary Industries and Fisheries, Brisbane, Australia. 565

    Bradbury PJ, Zhang Z, Kroon DE, Casstevens TM, Ramdoss Y, Buckler ES. 2007. TASSEL: 566 software for association mapping of complex traits in diverse samples. Bioinformatics 23(19): 567 2633-2635. 568

    Brown PJ, Klein PE, Bortiri E, Acharya CB, Rooney WL, Kresovich S. 2006. Inheritance of 569 inflorescence architecture in sorghum. Theoretical and applied genetics 113(5): 931-942. 570

    Browning BL, Browning SR. 2016. Genotype Imputation with Millions of Reference Samples. 571 American Journal of Human Genetics 98(1): 116-126. 572

    Cullis BR, Smith AB, Coombes NE. 2006. On the design of early generation variety trials with 573 correlated data. Journal of Agricultural Biological and Environmental Statistics 11(4): 381-574 393. 575

    Doyle JJ. 1987. A rapid DNA isolation procedure for small quantities of fresh leaf tissue. 576 Phytochemical Bulletin 19: 11-15. 577

    Feltus FA, Hart GE, Schertz KF, Casa AM, Kresovich S, Abraham S, Klein PE, Brown PJ, 578 Paterson AH. 2006. Alignment of genetic maps and QTLs between inter- and intra-specific 579 sorghum populations. Theoretical and applied genetics 112(7): 1295-1305. 580

    François OJRTiPGUG-A. 2016. Running structure-like population genetic analyses with R. 1-9. 581 Gambín B, Borrás L. 2010. Resource distribution and the trade‐off between seed number and seed 582

    weight: a comparison across crop species. J Annals of Applied Biology 156(1): 91-102. 583 Gelli M, Mitchell SE, Liu K, Clemente TE, Weeks DP, Zhang C, Holding DR, Dweikat IM. 584

    2016. Mapping QTLs and association of differentially expressed gene transcripts for multiple 585 agronomic traits under different nitrogen levels in sorghum. BMC plant biology 16. 586

    Han LJ, Chen J, Mace ES, Liu YS, Zhu MJ, Yuyama N, Jordan DR, Cai HW. 2015. Fine 587 mapping of qGW1, a major QTL for grain weight in sorghum. Theoretical and applied 588 genetics 128(9): 1813-1825. 589

    Huang XH, Zhao Y, Wei XH, Li CY, Wang A, Zhao Q, Li WJ, Guo YL, Deng LW, Zhu CR, et 590 al. 2012. Genome-wide association study of flowering time and grain yield traits in a 591 worldwide collection of rice germplasm. Nature genetics 44(1): 32-U53. 592

    Jordan DR, Mace ES, Cruickshank AW, Hunt CH, Henzell RG. 2011. Exploring and Exploiting 593 Genetic Variation from Unadapted Sorghum Germplasm in a Breeding Program. Crop 594 Science 51(4): 1444-1457. 595

    Li MX, Yeung JM, Cherny SS, Sham PC. 2012. Evaluating the effective numbers of independent 596 tests and significant p-value thresholds in commercial genotyping arrays and public 597 imputation reference datasets. Human Genetics 131(5): 747-756. 598

    Li Q, Yang XH, Bai GH, Warburton ML, Mahuku G, Gore M, Dai JR, Li JS, Yan JB. 2010. 599 Cloning and characterization of a putative GS3 ortholog involved in maize kernel 600 development. Theoretical and applied genetics 120(4): 753-763. 601

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Liu M, Tan X, Yang Y, Liu P, Zhang X, Zhang Y, Wang L, Hu Y, Ma L, Li Z, et al. 2019. 602 Analysis of the genetic architecture of maize kernel size traits by combined linkage and 603 association mapping. Plant Biotechnol J. doi: 10.1111/pbi.13188 604

    Liu X, Huang M, Fan B, Buckler ES, Zhang Z. 2016. Iterative Usage of Fixed and Random Effect 605 Models for Powerful and Efficient Genome-Wide Association Studies. PLoS Genetics 12(2): 606 e1005767. 607

    Mace E, Innes D, Hunt C, Wang X, Tao Y, Baxter J, Hassall M, Hathorn A, Jordan D. 2018. 608 The Sorghum QTL Atlas: a powerful tool for trait dissection, comparative genomics and crop 609 improvement. Theoretical and applied genetics. 610

    Mace E, Innes D, Hunt C, Wang XM, Tao YF, Baxter J, Hassall M, Hathorn A, Jordan D. 2019. 611 The Sorghum QTL Atlas: a powerful tool for trait dissection, comparative genomics and crop 612 improvement. Theoretical and applied genetics 132(3): 751-766. 613

    Mace ES, Hunt CH, Jordan DR. 2013. Supermodels: sorghum and maize provide mutual insight 614 into the genetics of flowering time. Theoretical and applied genetics 126(5): 1377-1395. 615

    Mace ES, Rami JF, Bouchet S, Klein PE, Klein RR, Kilian A, Wenzl P, Xia L, Halloran K, 616 Jordan DR. 2009. A consensus genetic map of sorghum that integrates multiple component 617 maps and high-throughput Diversity Array Technology (DArT) markers. BMC plant biology 618 9. 619

    Makita Y, Shimada S, Kawashima M, Kondou-Kuriyama T, Toyoda T, Matsui M. 2015. 620 MOROKOSHI: Transcriptome Database in Sorghum bicolor. Plant and cell physiology 56(1). 621

    Mao H, Sun S, Yao J, Wang C, Yu S, Xu C, Li X, Zhang Q. 2010. Linking differential domain 622 functions of the GS3 protein to natural variation of grain size in rice. Proceedings of the 623 National Academy of Sciences 107(45): 19579-19584. 624

    McCormick RF, Truong SK, Sreedasyam A, Jenkins J, Shu SQ, Sims D, Kennedy M, 625 Amirebrahimi M, Weers BD, McKinley B, et al. 2018. The Sorghum bicolor reference 626 genome: improved assembly, gene annotations, a transcriptome atlas, and signatures of 627 genome organization. Plant Journal 93(2): 338-354. 628

    Mocoeur A, Zhang YM, Liu ZQ, Shen X, Zhang LM, Rasmussen SK, Jing HC. 2015. Stability 629 and genetic control of morphological, biomass and biofuel traits under temperate maritime 630 and continental conditions in sweet sorghum (Sorghum bicolour). Theoretical and applied 631 genetics 128(9): 1685-1701. 632

    Murray SC, Sharma A, Rooney WL, Klein PE, Mullet JE, Mitchell SE, Kresovich S. 2008. 633 Genetic Improvement of Sorghum as a Biofuel Feedstock: I. QTL for Stem Sugar and Grain 634 Nonstructural Carbohydrates. Crop Science 48(6): 2165-2179. 635

    Paterson AH, Lin YR, Li Z, Schertz KF. 1995. Convergent domestication of cereal crops by 636 independent mutations at corresponding genetic loci. Science 269(5231): 1714. 637

    Peltonen-Sainio P, Kangas A, Salo Y, Jauhiainen L. 2007. Grain number dominates grain weight in 638 temperate cereal yield determination: Evidence based on 30 years of multi-location trials. 639 Field Crops Research 100(2-3): 179-188. 640

    Phuong N, Stützel H, Uptmoor R. 2013. Quantitative trait loci associated to agronomic traits and 641 yield components in a Sorghum bicolor L. Moench RIL population cultivated under pre-642 flowering drought and well-watered conditions. Agricultural Sciences 4(12): 781-791. 643

    Prado SA, Sadras VO, Borras L. 2014. Independent genetic control of maize (Zea mays L.) kernel 644 weight determination and its phenotypic plasticity. Journal of Experimental Botany 65(15): 645 4479-4487. 646

    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA, Bender D, Maller J, Sklar P, de 647 Bakker PI, Daly MJ, et al. 2007. PLINK: a tool set for whole-genome association and 648 population-based linkage analyses. American Journal of Human Genetics 81(3): 559-575. 649

    Rami JF, Dufour P, Trouche G, Fliedel G, Mestres C, Davrieux F, Blanchard P, Hamon P. 1998. 650 Quantitative trait loci for grain quality, productivity, morphological and agronomical traits in 651 sorghum (Sorghum bicolor L. Moench). Theoretical and applied genetics 97(4): 605-616. 652

    Rajkumar, Fakrudin B, Kavil SP, Girma Y, Arun SS, Dadakhalandar D, Gurusiddesh BH, Patil 653 AM, Thudi M, Bhairappanavar SB, et al. 2013. Molecular mapping of genomic regions 654 harbouring QTLs for root and yield traits in sorghum (Sorghum bicolor L. Moench). Physiol 655 Mol Biol Plants 19(3): 409-419. 656

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Reddy RN, Madhusudhana R, Mohan SM, Chakravarthi DVN, Mehtre SP, Seetharama N, Patil 657 JV. 2013. Mapping QTL for grain yield and other agronomic traits in post-rainy sorghum 658 [Sorghum bicolor (L.) Moench]. Theoretical and applied genetics 126(8): 1921-1939. 659

    Rosenow D, Dahlberg J, Stephens J, Miller F, Barnes D, Peterson G, Johnson J, Schertz KJCs. 660 1997. Registration of 63 converted sorghum germplasm lines from the sorghum conversion 661 program. 37(4): 1399-1400. 662

    Sadras VO. 2007. Evolutionary aspects of the trade-off between seed size and number in crops. Field 663 Crops Research 100(2): 125-138. 664

    Sands DC, Morris CE, Dratz EA, Pilgeram AL. 2009. Elevating optimal human nutrition to a 665 central goal of plant breeding and production of plant-based foods. Plant Science 177(5): 377-666 389. 667

    Sharma HC, Abraham CV, Stenhouse JW. 2002. Compensation in grain weight and volume in 668 sorghum is associated with expression of resistance to sorghum midge, Stenodiplosis 669 sorghicola. Euphytica 125(2): 245-254. 670

    Shehzad T, Okuno K. 2015. QTL mapping for yield and yield-contributing traits in sorghum 671 (Sorghum bicolor (L.) Moench) with genome-based SSR markers. Euphytica 203(1): 17-31. 672

    Smith A, Cullis B, Thompson R. 2001. Analyzing variety by environment data using multiplicative 673 mixed models and adjustments for spatial field trend. Biometrics 57(4): 1138-1147. 674

    Spagnolli FC, Mace E, Jordan D, Borras L, Gambin BL. 2016. Quantitative Trait Loci of Plant 675 Attributes Related to Sorghum Grain Number Determination. Crop Science 56(6): 3046-3054. 676

    Srinivas G, Satish K, Madhusudhana R, Reddy RN, Mohan SM, Seetharama N. 2009. 677 Identification of quantitative trait loci for agronomically important traits and their association 678 with genic-microsatellite markers in sorghum. Theoretical and applied genetics 118(8): 1439-679 1454. 680

    Stephens J, Miller F, Rosenow D. 1967. Conversion of Alien Sorghums to Early Combine 681 Genotypes 1. Crop Science 7(4): 396-396. 682

    Tao Y, Mace ES, Tai S, Cruickshank A, Campbell BC, Zhao X, Van Oosterom EJ, Godwin ID, 683 Botella JR, Jordan DR. 2017. Whole-Genome Analysis of Candidate genes Associated with 684 Seed Size and Weight in Sorghum bicolor Reveals Signatures of Artificial Selection and 685 Insights into Parallel Domestication in Cereal Crops. Frontiers in plant science 8. 686

    Tao YF, Mace E, George-Jaeggli B, Hunt C, Cruickshank A, Henzell R, Jordan D. 2018. Novel 687 Grain Weight Loci Revealed in a Cross between Cultivated and Wild Sorghum. The Plant 688 Genome 11(2). 689

    Thurber CS, Ma JM, Higgins RH, Brown PJ. 2013. Retrospective genomic analysis of sorghum 690 adaptation to temperate-zone grain production. Genome biology 14(6). 691

    Tuinstra MR, Grote EM, Goldsbrough PB, Ejeta G. 1997. Genetic analysis of post-flowering 692 drought tolerance and components of grain development in Sorghum bicolor (L.) Moench. 693 Molecular Breeding 3(6): 439-448. 694

    Upadhyaya HD, Wang YH, Sharma S, Singh S, Hasenstein KH. 2012. SSR markers linked to 695 kernel weight and tiller number in sorghum identified by association mapping. Euphytica 696 187(3): 401-410. 697

    Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI, Brown MA, Yang J. 2017. 10 Years 698 of GWAS Discovery: Biology, Function, and Translation. American Journal of Human 699 Genetics 101(1): 5-22. 700

    Wardlaw IF. 1990. The control of carbon partitioning in plants. New Phytologist 116(3): 341-381. 701 Yang Z, Van Oosterom EJ, Jordan DR, Hammer GL. 2009. Pre-anthesis ovary development 702

    determines genotypic differences in potential kernel weight in sorghum. Journal of 703 Experimental Botany: erp019. 704

    Yang ZJ, van Oosterom EJ, Jordan DR, Doherty A, Hammer GL. 2010. Genetic Variation in 705 Potential Kernel Size Affects Kernel Growth and Yield of Sorghum. Crop Science 50(2): 706 685-695. 707

    Zhang C, Dong SS, Xu JY, He WM, Yang TL. 2018. PopLDdecay: a fast and effective tool for 708 linkage disequilibrium decay analysis based on variant call format files. Bioinformatics. 709

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Zhang D, Li J, Compton RO, Robertson J, Goff VH, Epps E, Kong W, Kim C, Paterson AH. 710 2015. Comparative genetics of seed size traits in divergent cereal lineages represented by 711 sorghum (Panicoidae) and rice (Oryzoidae). G3: Genes| Genomes| Genetics 5(6): 1117-1128. 712

    713

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Table 1 Summary of phenotypic data collected from half head samples across trials in the 714

    diversity panel 715

    Trait Trial Trait Mean Range Hsq Correlation between Sites

    TKW (g) DPGAT16 28.8 6.5 - 58.4 0.79 0.87

    TKW (g) DPHER16 28.7 6.7 - 56.8 0.80

    Volume (mm3) DPGAT16 26.2 8.4 - 55.0 0.67 0.85

    Volume (mm3) DPHER16 26.1 9.9 - 55.6 0.70

    Length (mm) DPGAT16 4.41 2.93 - 5.85 0.68 0.88

    Length (mm) DPHER16 4.38 3.07 -5.88 0.70

    Width (mm) DPGAT16 3.50 2.29 -4.74 0.65 0.84

    Width (mm) DPHER16 3.46 2.34 -4.56 0.67

    Thickness (mm) DPGAT16 3.22 2.4 - 4.05 0.67 0.82

    Thickness (mm) DPHER16 3.25 2.64 - 3.96 0.75

    716

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Table 2 Summary of the phenotypic data collected from half head samples across trials in the 717

    BC-NAM population 718

    Trait Trial Mean Range Hsq Correlations across trials

    NAMGAT15 NAMGAT16 NAMHER15

    TKW (g) NAMGAT15 28.2 15.6 - 41.2 0.77 1 0.89 0.87

    TKW (g) NAMGAT16 29.9 16.9 - 44.5 0.53 0.89 1 0.98

    TKW (g) NAMHER15 30.8 13.9 - 45.6 0.66 0.87 0.98 1

    Volume (mm3) NAMGAT15 25.5 14.6 - 40.5 0.36 1 0.91 0.89

    Volume (mm3) NAMGAT16 25.3 15.0 - 38.7 0.37 0.91 1 0.81

    Volume (mm3) NAMHER15 25.7 15.4 - 36.4 0.41 0.89 0.81 1

    Length (mm) NAMGAT15 4.18 3.20 - 5.29 0.41 1 0.92 0.91

    Length (mm) NAMGAT16 4.17 3.29 - 5.31 0.45 0.92 1 0.85

    Length (mm) NAMHER15 4.18 3.31 - 5.07 0.40 0.91 0.85 1

    Width (mm) NAMGAT15 3.47 2.58 - 4.32 0.29 1 0.91 0.87

    Width (mm) NAMGAT16 3.47 2.65 - 4.26 0.39 0.91 1 0.79

    Width (mm) NAMHER15 3.48 2.76 - 4.15 0.29 0.87 0.79 1

    Thickness (mm) NAMGAT15 3.32 2.71 - 3.72 0.54 1 0.93 0.89

    Thickness (mm) NAMGAT16 3.31 2.85 - 3.69 0.41 0.93 1 0.86

    Thickness (mm) NAMHER15 3.35 2.9 - 3.78 0.59 0.89 0.86 1

    719

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Table 3 Summary of the grain size QTL identified in the diversity and BC-NAM populations 720

    QTL ID Chr P-start P-end G-start G-end Traits affected

    Overlap DP BC-NAM

    qGS1.1 1 1197434 1197434 3.18 3.18 _VL__ TVL__ Y

    qGS1.2 1 3551580 5918355 10.35 13.66 TVLWH TVLWH Y

    qGS1.3 1 6907712 7312784 16.89 18.27 TVLWH TVLWH Y

    qGS1.4 1 9747355 9819344 25.45 25.48 TVLWH _____ Y

    qGS1.5 1 12613546 13193518 31.25 33.54 _V___ TVLWH Y

    qGS1.6 1 17427801 17638035 43.49 43.60 TVLW_ TV__H Y

    qGS1.7 1 21347462 21457138 46.68 46.73 TVLW_ __L_H N

    qGS1.8 1 53201999 53936261 59.15 60.07 TV__H TVLWH N

    qGS1.9 1 62886564 62899303 94.75 94.76 TVLW_ TVLWH N

    qGS1.10 1 64100851 64100851 98.57 98.57 TVLWH TVLW_ N

    qGS1.11 1 67340797 67575811 115.03 116.85 T_L__ _V__H Y

    qGS1.12 1 68155790 68372488 123.06 125.38 ____H TVLWH Y

    qGS1.13 1 69803971 69803971 129.89 129.89 ____H __L_H N

    qGS1.14 1 75731963 75731963 156.66 156.66 T_LWH TVLW_ Y

    qGS2.1 2 1044805 1044805 0.10 0.10 TV_WH _VLW_ N

    qGS2.2 2 2902001 2977700 13.86 13.93 TV_WH TVLWH N

    qGS2.3 2 10134643 10134643 56.44 56.44 ____H TV__H Y

    qGS2.4 2 10821637 10821637 59.55 59.55 TVLWH TVLWH Y

    qGS2.5 2 57031598 57031598 88.94 88.94 TV_WH ____H N

    qGS2.6 2 57512343 57512343 93.17 93.17 TV_WH TV_W_ N

    qGS2.7 2 59194799 59194799 106.68 106.68 _VL__ TVLWH N

    qGS2.8 2 60589218 61587359 121.99 123.92 TVLWH TVLWH N

    qGS2.9 2 62351052 62351052 134.53 134.53 T____ __LWH N

    qGS2.10 2 66339464 66573617 148.73 149.23 ___W_ ____H Y

    qGS2.11 2 67288493 67673759 152.32 152.63 __L__ TVLW_ Y

    qGS2.12 2 69049000 69049000 159.86 159.86 TV_WH TVLW_ N

    qGS2.13 2 70379875 70379875 166.28 166.28 TVLW_ TVLW_ Y

    qGS2.14 2 73307187 74155649 180.56 182.44 TVLWH __LWH Y

    qGS2.15 2 75213858 75310814 188.18 189.09 TV_WH TVLWH Y

    qGS3.1 3 617297 617297 0.68 0.68 _____ TVLWH N

    qGS3.2 3 1678176 1681138 4.04 4.05 _____ TVLW_ N

    qGS3.3 3 2972919 2972919 9.19 9.19 TV_W_ T___H Y

    qGS3.4 3 3597488 3597488 11.06 11.06 TVLWH TVLWH Y

    qGS3.5 3 7459268 7459268 31.12 31.12 TV_W_ TVL_H N

    qGS3.6 3 11842474 13099669 46.52 48.31 TVLW_ TVLWH N

    qGS3.7 3 27806456 52373180 56.79 58.13 TV_WH T_LWH Y

    qGS3.8 3 54907952 55299538 70.31 72.25 TVLWH TVLW_ Y

    qGS3.9 3 58595834 58595834 102.06 102.06 T__WH T_L_H Y

    qGS3.10 3 70185356 70185356 149.33 149.33 TV_WH TV_WH Y

    qGS3.11 3 70935875 70935875 151.89 151.89 TVLWH T___H Y

    qGS3.12 3 71518340 71802981 154.39 155.80 TVLWH TVLWH Y

    qGS4.1 4 4223889 4229830 27.71 27.75 TV_WH TVL__ N

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • qGS4.2 4 4559844 4559844 29.99 29.99 TVLWH _____ N

    qGS4.3 4 7553755 7553755 55.81 55.81 ____H _____ Y

    qGS4.4 4 13834090 49255608 71.37 73.60 TVLW_ TVLWH Y

    qGS4.5 4 51365085 52404369 82.14 84.01 ____H TVLWH Y

    qGS4.6 4 54000479 54337837 94.49 94.71 TVLW_ TVLWH Y

    qGS4.7 4 57705140 58486661 106.83 106.99 TV_WH TVLW_ Y

    qGS4.8 4 61396481 61772353 121.78 123.63 TVLW_ TVLWH Y

    qGS4.9 4 66127254 66127254 152.27 152.27 TVLW_ __L__ N

    qGS5.1 5 3301226 3575220 22.87 23.25 TVLWH TVLW_ Y

    qGS5.2 5 38549008 54107776 64.58 65.76 _VLW_ TVLW_ Y

    qGS5.3 5 61218249 61218249 70.70 70.70 _VL__ _VL_H N

    qGS5.4 5 62745873 63339491 73.38 73.89 TVLWH TVLW_ Y

    qGS5.5 5 70963740 71211419 118.00 119.00 TVLW_ T____ N

    qGS6.1 6 2325384 2491386 28.20 30.03 _____ TVLWH N

    qGS6.2 6 3407678 3594201 32.68 33.68 T___H TVLWH Y

    qGS6.3 6 39213130 43198259 50.50 52.75 ___W_ TVLWH Y

    qGS6.4 6 49661049 49672525 89.27 89.28 __L__ ____H Y

    qGS6.5 6 54391413 55406078 125.43 125.64 __LWH ____H N

    qGS6.6 6 58659162 58659162 157.96 157.96 ___WH __LW_ Y

    qGS6.7 6 59994049 59994049 162.94 162.94 T__WH TVLWH Y

    qGS7.1 7 1463261 1463261 16.76 16.76 TVLW_ _____ N

    qGS7.2 7 4810838 5140654 58.13 58.36 TVLWH _VLW_ N

    qGS7.3 7 14300295 38985351 72.21 72.30 TVLWH TV_W_ Y

    qGS7.4 7 52822136 54307603 74.35 75.24 __LW_ TV_W_ Y

    qGS7.5 7 55675278 55675278 78.67 78.67 TVLWH TVL__ N

    qGS7.6 7 60263942 60263942 106.63 106.63 TVLW_ ___W_ Y

    qGS8.1 8 5474894 5997557 58.65 59.32 TVLWH TVLWH Y

    qGS8.2 8 10712689 10712689 67.33 67.33 TV_WH _V___ Y

    qGS8.3 8 58069299 58069299 87.75 87.75 ____H _VLW_ N

    qGS8.4 8 59873105 59873105 99.35 99.35 TV__H TV_WH N

    qGS8.5 8 60306779 60394256 102.58 103.12 TVLW_ TVLWH N

    qGS8.6 8 60911363 60911363 106.38 106.38 T____ TVLWH N

    qGS8.7 8 61382971 61382971 109.44 109.44 ____H T____ N

    qGS8.8 8 61773861 61868131 111.98 112.60 _____ TVLWH N

    qGS8.9 8 61977263 61977263 118.53 118.53 ____H TVL_H N

    qGS8.10 8 62329242 62329242 127.16 127.16 _____ TV__H N

    qGS9.1 9 1188287 1188287 14.45 14.45 TV_WH T____ N

    qGS9.2 9 8955692 15253529 67.33 69.39 TVLW_ TVLWH Y

    qGS9.3 9 42995635 45347175 75.64 76.17 TVLWH TVLWH Y

    qGS9.4 9 51441730 51441730 94.86 94.86 TV_WH __L_H Y

    qGS9.5 9 52757861 52795444 104.82 104.93 T___H T___H N

    qGS9.6 9 53947680 53959901 108.19 108.23 T__W_ TVLWH N

    qGS9.7 9 56099631 56099631 113.99 113.99 _VL__ TVLWH N

    qGS9.8 9 58170657 58198133 128.11 128.33 TV_WH TVLW_ N

    qGS10.1 10 777170 777170 13.22 13.22 _V__H T_L_H N

    qGS10.2 10 3158490 3445290 31.94 32.55 TVLWH T___H Y

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • qGS10.3 10 9752387 12131845 53.93 56.00 TVLWH TVLWH Y

    qGS10.4 10 18074205 40511327 57.96 58.58 TVLWH TVLWH Y

    qGS10.5 10 56447108 56447108 89.00 89.00 ____H T___H N

    qGS10.6 10 58653903 60172861 102.30 103.56 TVLWH TVLW_ N

    721

    Chr indicates the chromosome on which the QTL is located. P-start provides the physical coordinate, 722

    based on v3.1.1 of the sorghum genome assembly, of the start of the QTL CI. P-end provides the 723

    physical coordinate of the end of the QTL CI. G-start provides the predicted cM genetic linkage 724

    position of the start of the QTL CI based on the sorghum consensus map (Mace et al 2009). G-end 725

    provides the predicted cM genetic linkage position of the end of the QTL CI. The traits affected 726

    columns indicate the traits significantly associated with the QTL in diversity panel (DP) and BC-727

    NAM population respectively. The five traits were recorded as TVLWH, where T represents TKW, V 728

    represents volume, L represents length, W represents width and H represents thickness. The inclusion 729

    of the letters for a particular QTL means that the corresponding traits are affected by the QTL in the 730

    corresponding population. Missing letters in the order of TVLWH (represented as “_”) mean that the 731

    corresponding traits are not affected by the QTL in corresponding population. The Overlap column 732

    indicates whether the QTL overlaps with previously published grain size QTL in sorghum. N means 733

    the QTL is not overlapped with previously published grain size QTL, while Y mean the QTL is 734

    overlapped with previously published QTL. 735

    736

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Figure legends 737

    Figure 1 Field treatment and its effect on grain size. A) Field treatment of removing half of 738

    panicle imposed during flowering time. B) Significant increase of TKW and volume in half 739

    heads compared to full heads in DPGAT16. C) Varying increase percentage in TKW of half 740

    heads observed across genotypes. D) box-plots show variations of increase in TKW and 741

    volume in half heads across genotypes in DPGAT16. 742

    Figure 2: Phenotypic distribution and correlations of 5 grain size parameters across the 743

    diversity panel and BC-NAM population. The plots on the diagonal show the phenotypic 744

    distribution of each trait. The values above the diagonal are pairwise correlation coefficients 745

    between traits, and the plots below the diagonal are scatter plots of compared traits. 746

    Figure 3: GWAS of 5 grain size parameters in the diversity panel. GWAS results were 747

    displayed in Manhattan plots and Q-Q plots. Significant threshold (solid black line) and 748

    suggest threshold (dash black line) were 1.45E-06 and 2.91E-05, respectively, according to 749

    GEC test. 750

    Figure 4: GWAS of 5 grain size parameters in the BC-NAM population. GWAS results were 751

    displayed in Manhattan plots and Q-Q plots. Significant threshold (solid black line) and 752

    suggest threshold (dash black line) were 5.83E-06 and 1.17E-04, respectively, according to 753

    GEC test. 754

    Figure 5: Effects of landraces alleles of grain size QTL relative to elite recurrent parents in 755

    the BC-NAM population. Each box plot illustrates the effects of exotic alleles of grain size 756

    QTL across 17 BC-NAM families derived from crosses between landraces and elite recurrent 757

    parental lines. Within each family, effects of a QTL was calculated as difference between 758

    average of the exotic allele and average of allele of the recurrent parental line. Sixty-six QTL 759

    identified in BC-NAM were ordered based on the median effect of the exotic alleles. 760

    Correspondence between numbers on x-axis and QTL identity is provided in Table S16. 761

    Figure 6: The overlap of grain size QTL identified in the diversity panel and the BC-NAM 762

    population. 763

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Supplementary materials 764

    Supplementary Table S1 List of the diversity panel. a, country and region information was 765

    obtained from USDA GRIN (https://npgsweb.ars-grin.gov/gringlobal/search.aspx) and 766

    personal communication with Jeffery A. Dahlberg. b, race was classified based on population 767

    structure analysis, as explained in materials and methods. c records overlap between diversity 768

    panel and USSAP according to GRIN. 769

    Supplementary Table S2 Summary of the BC-NAM 770

    Supplementary Table S3 Summary of field experiments 771

    Supplementary Table S4 Racial group breakdown of the diversity panel 772

    Supplementary Table S5 List of reported grain size QTL in previous studies 773

    Supplementary Table S6 Lis of reported grain size genes in rice and maize 774

    Supplementary Table S7 Effect of population structure on grain size. Statistical significance 775

    was assessed using one-way analyses of variance (ANOVAs) followed by Tukey’s HSD tests 776

    for multiple comparisons. 777

    Supplementary Table S8 Marker Trait associations in the diversity panel. Physical position of 778

    SNPs were according to Sorghum bicolor v3.1.1. 779

    Supplementary Table S9 Effects of QTL in the diversity panel 780

    Supplementary Table S10 Marker Trait associations in BC-NAM. Physical position of SNPs 781

    were according to Sorghum bicolor v3.1.1. 782

    Supplementary Table S11 Effects of QTL in BC-NAM 783

    Supplementary Table S12 Tests of associations between population-specific QTL with grain 784

    size parameters in the alternative population 785

    Supplementary Table S13 Comparison of grain size regions identified in rice and maize with 786

    grain size QTL identified in this study 787

    Supplementary Table S14 List of grain size candidate genes in sorghum 788

    Supplementary Table S15 Summary of candidate genes within close vicinities of grain size 789

    QTL 790

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://npgsweb.ars-grin.gov/gringlobal/search.aspxhttps://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • 791

    Supplementary Table S16 Correspondence between numbers on x-axis of Figure 5 and 792

    identities of QTL identified in BC-NAM. 793

    Supplementary Figure S1 PCA and LD of BC-NAM. A, PCA of BC-NAM. B, PCA of 794

    diversity panel shows the BC-NAM parental lines capture a good proportion of diversity in 795

    the diversity panel. Dots in red colour represent non-recurrent parental lines. Dots in green 796

    represent recurrent parental lines. C, LD decay in the BC-NAM. 797

    Supplementary Figure S2 Bi-plots show PCA analysis of grain size parameters measured in 798

    the diversity panel. 799

    Supplementary Figure S3 Bi-plots show PCA analysis of grain size parameters traits 800

    measured in BC-NAM. 801

    Supplementary Figure S4 LD decay, Population structure and PCA of the diversity panel. A, 802

    LD decay of the diversity panel. B, population structure of the diversity panel. C, PCA of the 803

    diversity panel, dots in colour indicate representatives for each racial group. 804

    Supplementary Figure S5 Distribution of SNPs across sorghum genome in the diversity 805

    panel. 806

    Supplementary Figure S6 Distribution of SNPs across sorghum genome in BC-NAM. 807

    Supplementary Figure S7. Correlation of grain size between HH and FH. A, correlation of 808

    TKW between HH and FH. B, correlation of volume between HH and FH. 809

    Supplementary Figure S8 Manhattan plots and Q-Q plots show GWAS analysis of PCs 810

    derived from grain size parameters in the diversity panel. 811

    Supplementary Figure S9 Manhattan plots and Q-Q plots show GWAS analysis of PCs 812

    derived from grain size parameters in BC-NAM. 813

    Supplementary Figure 10: SbGS3 (Sobic.001G341700) controls grain size in sorghum. A) 814

    Expression pattern of sbGS3 in sorghum. B) Varying effects of sbGS3 alleles in non-recurrent 815

    parental lines compared with the recurrent parental line. 816

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Vo

    lum

    e (

    mm

    3)

    Th

    ou

    sand

    ke

    rne

    l w

    eig

    ht (g

    )

    P-value

  • Figure 2

    BC-NAMDP

    (g)

    (mm3)(mm3)

    (g)

    (mm)

    (mm)

    (mm)

    (mm)

    (mm)

    (mm)

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • TKW

    Volume

    Length

    Width

    Thickness

    Figure 3 GWAS in DP

    1 2 3 4 5 6 7 8 9 10

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Figure 4 GWAS in BC-NAM

    TKW

    Volume

    Length

    Width

    Thickness

    1 2 3 4 5 6 7 8 9 10

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Figure 5TK

    W(g

    )

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/

  • Figure 6

    DP NAM

    2826 38

    .CC-BY-NC-ND 4.0 International licensenot certified by peer review) is the author/funder. It is made available under aThe copyright holder for this preprint (which wasthis version posted July 22, 2019. . https://doi.org/10.1101/710459doi: bioRxiv preprint

    https://doi.org/10.1101/710459http://creativecommons.org/licenses/by-nc-nd/4.0/