for litter size in sheep Meta-analysis of genome-wide ...

20
Page 1/20 Meta-analysis of genome-wide association studies for litter size in sheep Mohsen Gholizade ( [email protected] ) Department of Animal Sciences and Fisheries, Sari Agricultural Sciences and Natural Resources University, Iran. Seyed Mehdi Esmaeili-Fard Department of Animal Sciences and Fisheries, Sari Agricultural Sciences and Natural Resources University, Iran. Research Article Keywords: Meta-analysis, GWAS, Pathway enrichment, Network analysis, Litter size, Sheep Posted Date: August 2nd, 2021 DOI: https://doi.org/10.21203/rs.3.rs-770225/v1 License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License Version of Record: A version of this preprint was published at Theriogenology on December 1st, 2021. See the published version at https://doi.org/10.1016/j.theriogenology.2021.12.025.

Transcript of for litter size in sheep Meta-analysis of genome-wide ...

Page 1: for litter size in sheep Meta-analysis of genome-wide ...

Page 1/20

Meta-analysis of genome-wide association studiesfor litter size in sheepMohsen Gholizade  ( [email protected] )

Department of Animal Sciences and Fisheries, Sari Agricultural Sciences and Natural ResourcesUniversity, Iran.Seyed Mehdi Esmaeili-Fard 

Department of Animal Sciences and Fisheries, Sari Agricultural Sciences and Natural ResourcesUniversity, Iran.

Research Article

Keywords: Meta-analysis, GWAS, Pathway enrichment, Network analysis, Litter size, Sheep

Posted Date: August 2nd, 2021

DOI: https://doi.org/10.21203/rs.3.rs-770225/v1

License: This work is licensed under a Creative Commons Attribution 4.0 International License.  Read Full License

Version of Record: A version of this preprint was published at Theriogenology on December 1st, 2021.See the published version at https://doi.org/10.1016/j.theriogenology.2021.12.025.

Page 2: for litter size in sheep Meta-analysis of genome-wide ...

Page 2/20

Abstract

BackgroundLitter size and ovulation rate are important reproduction traits in sheep and have important impacts onthe pro�tability of farm animals. To investigate the genetic architecture of litter size, we report the �rstmeta-analysis of genome-wide association studies (GWAS) using 522 ewes and 564,377 SNPs from sixsheep breeds.

ResultsWe identi�ed 29 signi�cant associations for litter size which 27 of which have been not reported inGWASs for each population. However, we could con�rm the role of BMPR1B in proli�cacy. Our gene setanalysis discovered biological pathways related to cell signaling, communication, and adhesion.Functional clustering and enrichment using protein databases identi�ed epidermal growth factor-likedomain affecting litter size. Through analyzing protein-protein interaction data, we could identify hubgenes like CASK, PLCB4, RPTOR, GRIA2, and PLCB1 that were enriched in most of the signi�cantpathways. These genes have a role in cell proliferation, cell adhesion, cell growth and survival, andautophagy.

ConclusionsNotably, identi�ed SNPs were scattered on several different chromosomes implying different geneticmechanisms underlying variation of proli�cacy in each breed. Given the different layers that make up thefollicles and the need for communication and transfer of hormones and nutrients through these layers tothe oocyte, the signi�cance of pathways related to cell signaling and communication seems logical. Ourresults provide genetic insights into the litter size variation in different sheep breeds.

IntroductionReproduction is one of the most important economic traits in sheep. Identi�cation of candidate genesand their causal mutations is an effective tool to understand the genetic mechanism underlying thevariation in reproductive performance in sheep (1). Fertility, fecundity, and proli�cacy are all indicators ofreproductive performance in sheep. Fertility is considered as the number of lambing per year, fecundity ismeasured as the number of lambs produced each year, and proli�cacy is estimated as the size of thelitter (2). So far, a variety of fecundity genes have been discovered in sheep. The �rst underlying mutationlinked to increased proli�cacy in sheep was identi�ed in the Bone Morphogenetic Protein Receptor 1B(BMPR1B) gene (3–5). Also, other variants associated with increased proli�cacy in sheep have beenreported in BMP15 (FecX) (6), GDF9 (FecG) (7), and, B4GALNT2 (FecL) (8) genes. Since these identi�edmutations could be breed-speci�c and even population-speci�c, further study of the genome in different

Page 3: for litter size in sheep Meta-analysis of genome-wide ...

Page 3/20

breeds/populations is recommended to identify novel causal variations associated with proli�cacy insheep.

The advent of the high-throughput genotyping technologies that facilitate the genotyping of thousands ofsingle-nucleotide polymorphisms (SNPs) has made it possible to utilize the Genome-wide associationstudies in identifying molecular markers associated with quantitative traits which provide newopportunities for marker-assisted selection and genomic selection. This could allow breeding programs tobecome more e�cient and result in much more increases in response to selection (9, 10).

Several GWAS for reproduction traits in sheep have been conducted in recent years and successfullyreported genetic markers. Demars et al. (2013) identi�ed a region close to the BMP15 gene onchromosome x and reported two non-conservative mutations called FecXGR and FecXO associated withlitter size and ovulation rate in the Grivette and Olkuska breeds, respectively. Gholizadeh et al. (2014)identi�ed Signi�cant SNPs on chromosomes 10 and 15 (within the DYNC2H1 gene) associated with littersize across the four parities in Baluchi sheep. Benavides and Souza. (2018) performed a GWAS inVacaria sheep population in which, an associated causal SNP within the GDF9 gene (Vacaria mutation)segregates. Calvo et al., (2020) identi�ed the FecXGR allele associated with proli�cacy in 158 RasaAragonesa ewes. They also identi�ed a novel polymorphism in exon 2 of BMP15 using the PCR-sequencing approach. Smołucha et al. (2021) reported a genome-wide signi�cant SNP located within theEPHA6 gene associated with litter size in Polish Mountain sheep. Esmaeili-Fard et al. (2021) reportedNTRK2 as a novel candidate gene associated with litter size in Baluchi sheep. Xia et al., (2020) alsoidenti�ed both 105,603,287 lncRNA and its target gene, NTRK2, as signi�cantly upregulated genesassociated with proli�cacy in the Han sheep.

When it comes to exploring candidate genes and discovering new mutations underlying variation in thetrait of interest, optimal power to correctly reject the null hypothesis is crucial. Combining datasets couldresult in more power. Meta-analysis is an approach that makes it possible to combine data from variousstudies. These techniques also provide an opportunity for a quantitative assessment of the �ndings’accuracy or inconsistency/heterogeneity through multiple studies (17). Also, the protein-proteininteraction (PPI) networks are of special interest because the genes usually act as a group to achieve acollective goal. The networks may correlate to speci�c functions. PPI networks provide valuableinformation in the understanding of the cellular function and biological processes.

In the previous study, (1) through six independent GWAS in high and low proli�c sheep breeds (sixbreeds), identi�ed multiple candidate genes (BMPR1B, FBN1, MMP2, GRIA2, SMAD1, CTNNB1, NCOA1,INHBB, NF1, FLT1, PTGS2, PLCB3, ESR2, ESR1, GHR, ETS1, MMP15, FLI1, and SPP1) associated with littersize. Our study aimed to investigate genetic mechanisms underlying litter size in sheep using the meta-analysis approach. Meta-analysis of these six independent GWAS could improve the power to identifymore variants associated with litter size. We also conducted gene set analysis based on the results ofmeta-analysis as a complementary approach to investigate litter size from a biological perspective. Also,

Page 4: for litter size in sheep Meta-analysis of genome-wide ...

Page 4/20

With the tremendous increase in protein interaction data, we applied network-based analyses tounderstand molecular mechanisms of litter size.

Materials And Methods

GWAS and Meta-AnalysisWe considered genotypes and phenotypes of 522 ewes from �ve sheep breeds of high (Wadi, n = 160; Hu,n = 117; Icelandic, n = 54; Finnsheep, n = 54; and Romanov, n = 78) and one low (Texel, n = 59) proli�cacy.The GWA analysis of these breeds for proli�cacy has already been published by (1) and details aboutsample collection, phenotyping, and genotyping can be found in that paper. All the sheep breeds weregenotyped for almost 600k positions using the Ovine In�nium HD BeadChip (Illumina, San Diego, CA,United States). First, conducting six separate GWAS, we have reproduced all GWAS results for each breed.Quality control of genotyping was carried out using the GenABEL (18) R package. SNPs with call rates <95%, minor allele frequency (MAF) < 0.05, and p-value of Fisher’s exact test for Hardy–Weinbergequilibrium < 10-5 excluded from the analysis. Also, samples with call rates ≤ 95% were removed fromdatasets.

To remove the parity effect, the mean litter size of each individual (total litter size/parity) was used as theresponse variable. Based on the distribution of phenotypes, ewes in each breed, coded as high- and low-proli�c (for details see (1)). To account for the within-population substructure, we implemented aprincipal component analysis (PCA) using a set of LD-pruned SNPs and two �rst PC were incorporated inthe model. Plink1.9 (19) and GenABEL (18) were used, respectively. We ran the following logistic model toevaluate the association between SNPs and litter size in each breed: see formula 1 in the supplementary�les section.

where, y is the vector of case/control value (1 and 2) for each ewe; XSNP is the incidence matrix relatingvalues in y vector to SNP genotypes, and βSNP is the regression coe�cient. The analysis was performedusing the GenABEL (18) package. The p-values were almost non-in�ated (0.87 ≤ λ ≤ 1.14) for all traitsand partial in�ation was corrected using the genomic control (GC) method.

A single-trait meta-analyses were undertaken across GWAS results from six sheep breeds using the �xedeffects inverse variance weighted method implemented in the METAL (20) software. This meta-analyticalapproach works based on the GWAS SNPs summary statistics - beta and its standard errors - and isconsidered the most powerful method to identify novel loci (21). For multiple testing corrections, basedon the p-value cutoff of 0.05 and the total number of pruned SNPs (independent tests) we determined thegenome-wide threshold (31,752/0.05=1.57e-06). Also, based on the average number of SNPs on eachchromosome and the same P-value cutoff, we determined a chromosome-wise (suggestive) threshold(0.05/1176=4.25e-05). The CMplot (https://github.com/YinLiLin/R-CMplot) R package was used fordrawing the Manhattan plot.

Page 5: for litter size in sheep Meta-analysis of genome-wide ...

Page 5/20

Functional AnnotationWe explored the functional signi�cance of the top SNPs from the meta-analysis using the GTF �le fromOar_v.4.0 ovine reference genome assembly. The search was performed using a 15 k base pair windowaround the top SNPs. GeneCards (https://www.genecards.org/, last access: 20 April 2021) and NationalCenter for Biotechnology Information Gene (http://www.ncbi.nlm.nih.gov/gene/, last access: 20 April2021) were used to reveal gene functions. The sheep QTL database (http://www.animalgenome.org/cgi-bin/QTLdb/OA/index, last access: 20 April 2021) was used to �nd whether the identi�ed SNPs located inknown proli�cacy-related QTLs.

Gene-set enrichment and functional clustering analysisFor enrichment analysis, we need to construct a vector contains signi�cant gene names. As the trait haspolygenic nature, several poorly associated SNPs fall below the signi�cant line and the large majority ofthe genetic variants contributing to the trait remain hidden. So, to capture the cumulative effects of poorlyassociated variants, we used an arbitrary threshold (0.001) to �lter SNPs for constructing gene vector.Using an in-home R script and Oar_v.4.0 ovine reference genome assembly, we wrapped SNPs around thegenes and assigned SNPs to genes if they were located within the genes or 15 kb upstream ordownstream of a gene (list of signi�cant genes in gene vector are provided in Supplementary File 1, sheet5).

To extract biological views from the resulted gene vector, Gene Ontology (GO) and Kyoto Encyclopedia ofGenes and Genomes (KEGG) databases were used to de�ne the functional gene-sets. KEGG enrichmentanalysis was performed using the R package clusterPro�ler (22), but due to the low gene id conversionrate (to ensemble id), we used g:Pro�ler (23), a web-based enrichment tool, for GO enrichment which ledto better conversion rate and results. Terms and pathways with FDR and q-value less than 0.05 (GO andKEGG, respectively) were reported as signi�cant sets. The ggplot2 (24) R package was used to visualizeenrichment results as bar plots. For enrichment using protein databases, Cytoscape 3.8.2 (25) along withStringApp 1.6.0 plug-in (26) were used.

Signi�cant genes (o�cial gene symbol) were loaded in DAVID (27,28) for functional clustering analysisagainst the ovine genes as background. This tool classi�es genes in functionally related groups based ongene ontology, protein data, and pathway data, and calculates enrichment scores for groups of geneswith a related biological function. Analysis was performed using the DAVID default setting. Thefunctional cluster with the greatest enrichment score was reported.

Protein-protein interactive (PPI) network construction andmodule identi�cation

Page 6: for litter size in sheep Meta-analysis of genome-wide ...

Page 6/20

To investigate the interactions of proteins encoded by the identi�ed signi�cant genes, the StringApp 1.6.0(26) plugin in Cytoscape 3.8.2 (25) was used to build the PPI network using the STRING 11.0 database(29). The MCODE 2.0.0 (30) plugin, another Cytoscape App, was used to identify the very closelyassociated regions within the network which are known as the core modules of the PPI. MCODE identi�esthe highly interacting nodes (clusters) present in the network which has high quality based on topology,function and, localization similarity of proteins within predicted complexes. Degree cut-off 4 along withthe 2 k‐core were used as thresholds. Each node gets a score and clusters will be ranked using scoring bythe density and size of the cluster. Clusters with a higher score have more interaction between genes. Toidentify key genes in the constructed network, another Cytoscape App, cytoHubba 0.1 (31) was used torank the individual nodes and �nding hub genes. The hub genes were identi�ed based on degreeparameters, which represent the total number of nodes connected to its adjacent nodes. It also providesthe count of the number of interactions of a given node. The top ten hup genes have been reported.

ResultsSingle-trait meta-analysis of GWAS results

We conducted, a single-trait meta-analysis of GWAS for proli�cacy across six sheep breeds. Here, 29SNPs associated with proli�cacy were identi�ed. Among them, only two SNPs reached the genome-widesigni�cance level (Fig. 1). The QQ plot did not show any in�ation of the test statistics (λgc = 1.027, Fig. 2).

We identi�ed known and novel QTL across six sheep breeds. The summary of signi�cant SNPs in themeta-analysis is presented in Table 1. Consistent with previous reports, OAR3_OAR6_29362224(rs416717560), an SNP within the bone morphogenetic protein (BMP) receptor-1B (BMPR1B/FecB), wasassociated with litter size at the genome-wide signi�cance level. Also, meta-analysis identi�ed 27suggestive SNPs that some of them have been already reported in other GWASs for litter size. Theidenti�ed SNPs were scattered on various chromosomes implying different genetic mechanismsunderlying variation of proli�cacy in each breed.

Page 7: for litter size in sheep Meta-analysis of genome-wide ...

Page 7/20

Table 1Genome- and chromosome-wide signi�cant GWAS meta-analysis results.

SNP Chr Position(bp)

Major/minor

allele

Genes (15kb) Adjusted p-value (GC)

OAR3_OAR14_23199877 14 23133427 G/A LOC105608122 5.55E-07

OAR3_OAR6_29362224 6 29295803 G/A BMPR1B * 1.04E-06

OAR3_OAR2_47930582 2 47968324 G/A LOC105607546*

2.06E-06

OAR1_274928289.1 1 254159304 A/G TMEM108 * 3.71E-06

OAR3_OAR3_177468317 3 177241585 A/G LOC105614803 4.03E-06

OAR3_OAR2_132705930 2 132717299 A/G MTX2 * 4.82E-06

OAR3_OAR1_197796181 1 197637524 G/A LOC105602706 5.46E-06

OAR3_OAR22_3829898 22 3811473 C/A PCDH15 * 7.59E-06

OAR3_OAR19_59955075 19 59939988 G/C KLF15 * 8.09E-06

OAR3_OAR12_42069297 12 41992138 C/A SLC25A33 * 8.13E-06

OAR3_OAR12_42070709 12 41993550 G/A SLC25A33 * 8.13E-06

OAR3_OAR3_9708115 3 9652824 G/A MVB12B * 9.59E-06

OAR3_OAR1_10606057 1 10587862 A/G AGO1 * 9.97E-06

OAR3_OAR21_31104689 21 31046572 G/A ETS1 * 1.06E-05

OAR3_OAR2_75981497 2 76028277 G/A PTPRD * 1.22E-05

OAR3_OAR2_132690893 2 132702262 C/A MTX2 * 1.26E-05

OAR3_OAR10_2844118 10 2830516 G/A DIAPH3 * 1.46E-05

OAR3_OAR2_140444547 2 140460303 A/G B3GALT1 * 1.77E-05

S65625.1 2 140457860 A/G B3GALT1 * 1.77E-05

OAR3_OAR6_114562470 6 114447690 G/A ADRA2C 1.84E-05

OAR3_OAR13_61618860 13 61522307 A/G SUN5, BPIFB2 * 2.56E-05

OAR3_OAR2_61245281 2 61286687 G/A NMRK1,LOC106990898

2.91E-05

OAR3_OAR3_177454989 3 177228288 G/A LOC105614803 2.99E-05

SNPs with boldface are signi�cantly associated with genes but otherwise are suggestive SNPs. Star(*) shows that identi�ed SNP is located within the gene.

Page 8: for litter size in sheep Meta-analysis of genome-wide ...

Page 8/20

SNP Chr Position(bp)

Major/minor

allele

Genes (15kb) Adjusted p-value (GC)

OAR3_OAR18_43948718 18 43894548 G/A SNX6,LOC101103761

3.11E-05

OAR3_OAR1_233450483 1 233307971 A/C MBNL1 3.49E-05

OAR3_OAR5_77138988 5 77064581 G/A LOC101120408*

3.81E-05

OAR3_OAR7_1507850 7 1495859 G/A EPB41L4A * 3.83E-05

OAR3_OAR16_15290229 16 15296602 G/A RNF180 * 3.84E-05

OAR3_OAR14_9381054 14 9355176 A/G CDH13 * 3.89E-05

SNPs with boldface are signi�cantly associated with genes but otherwise are suggestive SNPs. Star(*) shows that identi�ed SNP is located within the gene.

Gene-set enrichment and functional clustering analysis

The result of the meta-analysis was complemented with gene set enrichment analysis using the GO,KEGG, and some protein databases. We also performed a functional clustering analysis to see howidenti�ed terms and pathways cluster. In total, from 564,377 SNP tested in the meta-analysis, 342,340SNP were located within or 15 kb upstream or downstream of 28,168 genes in the Oar_v.4.0 ovinereference genome assembly. About 431 genes, out of 28,168 genes contained at least one signi�cantSNP (P-value ≤ 0.001) and de�ned as signi�cantly associated genes with the litter size. GO terms andKEGG pathways with an adjusted P-value ≤ 0.05 were reported as signi�cant (Fig. 2).

Several GO terms and KEGG pathway related to signaling showed an overrepresentation of signi�cantgenes, including, signaling (GO:0023052), signal transduction (GO:0007165), trans-synaptic signaling(GO:0099537), Glucagon signaling pathway (oas04922), Calcium signaling pathway (oas04020),Phospholipase D signaling pathway (oas04072), PI3K-Akt signaling pathway (oas04151), AMPKsignaling pathway (oas04152), and Estrogen signaling pathway (oas04915). In addition, terms andpathways related to cell communication, cell junction, cell adhesion, and synapse were among thesigni�cant categories.

Enrichment analysis using protein databases showed that several genes in our gene vector, signi�cantlyenriched in categories related to epidermal growth factor-like domain (EGF-like domain) (Table 2). Itmeans that enriched genes in identi�ed protein categories contain at least one EGF-like domain in a moreor less conserved form, a sequence in which about thirty to forty amino-acid residues long is found in thesequence of epidermal growth factor. The effect of EGF in follicular development and proli�cacy hasbeen stressed (32). To identify clusters of functionally related molecular functions, the DAVID functionalannotation clustering tool was used. Clusters were ranked according to the enrichment score for each

Page 9: for litter size in sheep Meta-analysis of genome-wide ...

Page 9/20

cluster re�ecting the geometric mean of all the enrichment p-values (EASE scores) of each annotationterm in the cluster. Here, we reported the �rst cluster with an enrichment score of 1.77 (Table 3).

Table 2Enrichment results using protein databases.

Category * Term name Description Count P-value (FDR)

SMART Domains SM00181 Epidermal growth factor-like domain. 13 0.0033

UniProt Keywords KW-0245 EGF-like domain 13 0.0043

InterPro Domains IPR000742 EGF-like domain 14 0.0116

InterPro Domains IPR013032 EGF-like, conserved site 11 0.0326

Pfam PF00008 EGF-like domain 7 0.04

* Protein databases provide functional analysis of proteins by classifying them into families andpredicting domains and important sites.

This cluster contains seven categories from protein databases that four of them related to the EGFand EGF-like domain and three of them were associated with the laminin globular (G) domain (LamG). The Laminin G is about 180 amino acid long domain found in a large and diverse set ofextracellular proteins. Enrichment results using all databases are provided in Supplementary File 1,sheet 1 to sheet 4.

Table 3Functional annotation clustering result. First top annotated cluster is presented (enrichment score: 1.77).

Category Termname

Description Count P-value

SMART Domains SM00282 LamG 6 0.0008

SMART Domains SM00181 EGF 11 0.001

InterProDomains

IPR001791 Laminin G domain 6 0.001

InterProDomains

IPR013032 EGF-like, conserved site 10 0.002

InterProDomains

IPR000742 Epidermal growth factor-like domain 11 0.002

InterProDomains

IPR013320 Concanavalin A-like lectin/glucanase,subgroup

9 0.019

UniProtKeywords

KW-0245 EGF-like domain 6 0.045

PPI network construction and module identi�cation

Page 10: for litter size in sheep Meta-analysis of genome-wide ...

Page 10/20

Using the SringApp plugin, protein-protein interaction data of 431 signi�cant genes were extracted fromthe STRING v 1.6.0 database and visualized using Cytoscape. After removing genes (proteins) with noPPI information, 186 nodes and 254 edges remained in the network. The MCODE plugin was applied todiscover the hub clusters in the network. The most signi�cant module was shown in Figs. 3A and B withinthe red dashed circles. It comprises 5 nodes and 10 edges. Hub genes were selected using the cytoHubbaplugin. Figure 3 shows the networks constructed by 10 (Fig. 3A) and 15 hub genes (Fig. 3B). Figure 3Aalso shows the �rst-stage nodes. Top ten hub genes in order were CASK, PLCB4, RPTOR, GRIA2, PLCB1,GRID1, PPP3CC, PPP3CA, TMEM201, and PIK3C2G. Identi�ed hub genes were submitted for the pathwayanalysis but due to the same results compared with the full gene list, we didn’t present the results.

DiscussionIn this study, meta-analysis was used to combine the results of the six independent GWAS studies onlitter size. In recent years, several studies have been conducted to identify causal variants underlying thegenetic of economically important traits in livestock. Although genotyping costs have notably beenreduced, GWA studies in large scale size are still costly (33). The use of small sample sizes may notprovide enough power to identify variants with small effect sizes (34), and these variants require largesample sizes, especially when their allelic frequency is low in the population. In addition, the �ndings ofsmall-scale genomic studies might not be con�rmed in larger studies, which is more common incandidate gene studies (35). Meta-analysis of multiple GWA studies as an appropriate achievement hasopened ups possibly to reach enough sample size and su�cient power to identify variants with smalleffect sizes.

In this study, through meta-analysis of six GWAS results, we identi�ed two genome-wide signi�cant SNPs,OAR14_23199877 and OAR6_29362224 on chromosomes 14 and 6, respectively.

Functional annotation revealed that OAR6_29362224 is located within the BMPR1B (FecB) gene. Thisgene has previously been reported by Xu et al. (2018) for Hu sheep as candidate genes associated withlitter size. The BMPR1B encodes a protein that is a member of the intracellular serine/threonine kinasesignaling domain. The ligands of this receptor are BMPs, which are members of the TGF-betasuperfamily. Booroola (FecB) ewes have an increased ovulation rate and an average litter size of one totwo extra lambs per copy of the FecB gene (5). The BMPR1B gene can lead to an increased density of thefollicle-stimulating hormone (FSH) and luteinizing hormone (LH) receptors with a concurrent reduction inapoptosis to increase the ovulation rate of ewes (36, 37).

In addition, the results of the meta-analysis identi�ed 27 other variants that were signi�cantly associatedwith litter size at the chromosome level. Except for the ETS1 gene, other candidate genes reported by Xuet al. (2018) were not identi�ed in our meta-analysis on this data set. It has been reported that the ETS1gene affects ovulation in bovine by its link to the regulator of protein signaling protein-2 (RGS2) (38).Also, further investigation revealed that many of these SNPs are located within known ovine genes thatare associated with reproduction.

Page 11: for litter size in sheep Meta-analysis of genome-wide ...

Page 11/20

Gene-set enrichment analyses using the GO, KEGG, and some protein databases were applied tocomplement meta-analysis. GWAS is an e�cient approach to identify variants that their effects are largeenough. However, complex traits are controlled by a large number of variants with small effects, whichmake it di�cult to identify these variants in this way, even with large sample sizes (39). There are alsocausal variants located in non-coding regions that make it di�cult to understand the biologicalmechanisms that control traits. Gene set enrichment analysis provides an opportunity to address both ofthese problems (40). In this study, both GO and KEGG analyses led to the identi�cation of several termsand signaling pathways related to reproductive functions.

Both GO and KEGG enrichment identi�ed several terms and pathways related to signaling and cellcommunication processes (Fig. 2). Given the different layers that make up the follicles and the need forcommunication and transfer material through these layers to the oocyte, the signi�cance of thesepathways seems logical. Identi�ed pathways are frequently reported in previous studies (15, 16, 41, 42).PI3K-Akt signaling pathway has been identi�ed in multiple publications associated with reproduction (15,42). Various normal cellular processes, such as cell proliferation, survival, motility, growth, cytoskeletalrearrangement, and metabolism, through multiple downstream targets, are regulated by thePI3K/PTEN/Akt pathway (43). Abnormalities in the PI3K-Akt signaling pathway may associate with somestatus of female infertility attributed to primordial follicle depletion, including premature ovarian failureand primary amenorrhea (44). Studies identi�ed the Calcium signaling pathway as a signi�cant KEGGterm associated with proli�cacy (15, 42). Also, (1) reported calcium ion binding GO term as pathwayaffecting proli�cacy in Chinese sheep. Studies have demonstrated that calcium signaling has a key rolein the oocyte’s meiotic maturation (45, 46). It has been reported that the absence of glucagon signalingcan result in a poor uterine environment for fetal growth. The lack of glucagon signaling duringpregnancy was related to a decreased litter size, increased neonatal mortality, and severe intrauterinegrowth restriction concluding that glucagon signaling has a key function during mating and pregnancymaintenance (47). In addition, Ouhilal et al. (2012) reported that lack of glucagon signaling duringpregnancy alters the maternal metabolic milieu that can result in intrauterine growth restriction and fetaldemise. The Phospholipase D signaling pathway was also reported by Esmaeili-Fard et al. (2021) as asigni�cant term associated with litter size in Baluchi sheep. It is documented that AMPK is involved in theregulation of ovarian functioning in females (49). Some studies show that AMPK regulates thesteroidogenesis of granulosa cells and oocyte maturation (50–52). Leptin, adiponectin, ghrelin, andresistin as some metabolic hormones may to some extent act through the AMPK signaling. Thesehormones also have roles in the regulation of reproductive functions in both males and females throughthe hypothalamus-pituitary-gonadal axis (49). Estrogens as ovarian sex hormones are main regulators ofthe female reproductive functions and responsible for cellular proliferation and growth of tissues involvedin reproduction (53). Also, several candidate genes for reproduction such as BMPR1B, identi�ed in thepresent study, are related to the estrogen signaling pathway (54).

In this study, functional annotation clustering led to a cluster containing seven categories among whichfour of them related to the EGF and EGF-like domain and three of them associated with the lamininglobular (G) domain (Lam G). Shimada et al. (2016) identi�ed the epidermal growth factor (EGF)-like

Page 12: for litter size in sheep Meta-analysis of genome-wide ...

Page 12/20

factor that affects EGF receptor and then induces the ERK1/2 and Ca2+‐PLC pathways in cumulus cells.Down‐regulation of these two pathways leads to the notable suppression of induction of cumulusexpansion and oocyte maturation implying that both pathways are inducers of the ovulation process.

In the present study, protein-protein interaction data were extracted and used to discover the hub genesand module identi�cation. The �rst three identi�ed hub genes include CASK, PLCB4, and RPTOR. TheCASK gene encodes a calcium/calmodulin-dependent serine protein kinase that is a member of themembrane-associated guanylate kinase (MAGUK) protein family. MAGUKs are scaffolding proteinsassociated with intercellular junctions (56). It may mediate a link between the extracellular matrix and theactin cytoskeleton via its interaction with syndecan and with the actin/spectrin-binding protein. It hasbeen shown that CASK regulates the proliferation and adhesion of epidermal keratinocytes (57). Oliver etal. (2019) in a GWA analysis identi�ed PLCB4 as a master regulator associated with spontaneousabortion in cattle. PLCB4 belongs to the phospholipase C-beta family and is associated with the samefamily as the leading-edge, PLCB1. As a master regulator, PLCB4 regulates PLCB1 and has a function incalcium signaling (59), which is essential for appropriate uterine quiescence and smooth musclecontractions during gestation (58). In addition, PLCB4 plays a role in the embryonic development ofinternal and external facial structures (60), with disruptions in development possibly driving to fetal deathif development is compromised (58). Notably, PLCB4 was enriched in most of the identi�ed pathwaysrelated to signaling. Another identi�ed hub gene, RPTOR (Regulatory Associated Protein of MTORComplex 1), encodes part of a signaling pathway regulating cell growth and survival, and autophagy inresponse to nutrient and hormonal signals (insulin levels). This gene encodes an evolutionarily conservedprotein with multiple roles in the mTOR pathway (61).

ConclusionsIn conclusion, we performed the �rst GWAS meta-analysis of litter size in six sheep populations. Weidenti�ed BMPR1B, a gene with con�rmed roles in proli�cacy. We also identi�ed 26 loci that were notidenti�ed in the independent studies. Most of the signi�cant genes were enriched by biological pathwaysrelated to cell signaling, communication, and adhesion. Given the different layers that make up thefollicles and the need for communication and transfer material through these layers to the oocyte, thesigni�cance of these pathways seems logical. Enrichment using protein databases and functionalclustering identi�ed the Epidermal growth factor-like domain in association with litter size. Also, ournetwork analysis, identi�ed hub genes like CASK, PLCB4, and RPTOR that were enriched in most of thesigni�cant pathways. These results can provide further knowledge and a better understanding of geneticvariants and genes underlying litter size, a complex trait in sheep.

Declarations

AcknowledgmentsNot applicable.

Page 13: for litter size in sheep Meta-analysis of genome-wide ...

Page 13/20

Supplementary materialSupplementary File1 (sheet 1). KEGG pathways signi�cantly (P ≤ 0.05) enriched using genes associatedwith litter size.

Supplementary File1 (sheet 2). GO terms. GO terms signi�cantly (P ≤ 0.05) enriched using genesassociated with litter size in Baluchi sheep.

Supplementary File1 (sheet 3). Protein categories were signi�cantly (P ≤ 0.05) enriched using genesassociated with litter size.

Supplementary File1 (sheet 4). Functional clustering result.

Supplementary File1 (sheet 5). Identi�ed signi�cant genes used for enrichment and network analyses.

Availability of data and materialsThe datasets analyzed during the current study are available onhttps://www.animalgenome.org/repository/pub/CAAS2018.0302/.

Competing InterestThe authors declare that they have no competing interests

Financial support statementThis research received no speci�c grant from any funding agency, commercial or not-for-pro�t section

Ethics approvalNot applicable.

Consent for publicationNot applicable.

Author Contributions

Page 14: for litter size in sheep Meta-analysis of genome-wide ...

Page 14/20

Mohsen. Gholizadeh: Project administration, Conceptualization, Resources, Writing – original draft,Writing – review & editing.

Seyed Mehdi Esmaeili-Fard: Conceptualization, Methodology, Formal analysis, Software, Visualization,Writing – original draft, Writing – review & editing. All authors read and approved the �nal manuscript.

References1. Xu S-S, Gao L, Xie X-L, Ren Y-L, Shen Z-Q, Wang F, et al. Genome-wide association analyses highlight

the potential for different genetic mechanisms for litter size among sheep breeds. Front Genet.2018;9:118.

2. Smołucha G, Gurgul A, Jasielczuk I, Kawęcka A, Miksza-Cybulska A. A genome-wide associationstudy for proli�cacy in three Polish sheep breeds. J Appl Genet. 2021;62(2):323–6.

3. Souza CJH, MacDougall C, Campbell BK, McNeilly AS, Baird DT. The Booroola (FecB) phenotype isassociated with a mutation in the bone morphogenetic receptor type 1 B (BMPR1B) gene. JEndocrinol. 2001;169(2):R1.

4. Mulsant P, Lecerf F, Fabre S, Schibler L, Monget P, Lanneluc I, et al. Mutation in bone morphogeneticprotein receptor-IB is associated with increased ovulation rate in Booroola Merino ewes. Proc NatlAcad Sci. 2001;98(9):5104–9.

5. Wilson T, Wu X-Y, Juengel JL, Ross IK, Lumsden JM, Lord EA, et al. Highly proli�c Booroola sheephave a mutation in the intracellular kinase domain of bone morphogenetic protein IB receptor (ALK-6)that is expressed in both oocytes and granulosa cells. Biol Reprod. 2001;64(4):1225–35.

�. Galloway SM, McNatty KP, Cambridge LM, Laitinen MPE, Juengel JL, Jokiranta TS, et al. Mutationsin an oocyte-derived growth factor gene (BMP15) cause increased ovulation rate and infertility in adosage-sensitive manner. Nat Genet. 2000;25(3):279.

7. Hanrahan JP, Gregan SM, Mulsant P, Mullen M, Davis GH, Powell R, et al. Mutations in the genes foroocyte-derived growth factors GDF9 and BMP15 are associated with both increased ovulation rateand sterility in Cambridge and Belclare sheep (Ovis aries). Biol Reprod. 2004;70(4):900–9.

�. Drouilhet L, Mansanet C, Sarry J, Tabet K, Bardou P, Woloszyn F, et al. The highly proli�c phenotypeof Lacaune sheep is associated with an ectopic expression of the B4GALNT2 gene within the ovary.PLoS Genet. 2013;9(9):e1003809.

9. Meuwissen THE, Hayes BJ, Goddard ME. Prediction of total genetic value using genome-wide densemarker maps. Genetics. 2001;157(4):1819–29.

10. Goddard ME, Hayes BJ, Meuwissen THE. Genomic selection in livestock populations. Genet Res(Camb). 2010;92(5–6):413–21.

11. Demars J, Fabre S, Sarry J, Rossetti R, Gilbert H, Persani L, et al. Genome-wide association studiesidentify two novel BMP15 mutations responsible for an atypical hyperproli�cacy phenotype in sheep.PLoS Genet. 2013;9(4):e1003482.

Page 15: for litter size in sheep Meta-analysis of genome-wide ...

Page 15/20

12. Gholizadeh M, Rahimi-Mianji G, Nejati-Javaremi A, De Koning DJ, Jonas E. Genomewide associationstudy to detect QTL for twinning rate in Baluchi sheep. J Genet. 2014 Aug;93(2):489–93.

13. Benavides M V, Souza H. How e�ciently Genome-Wide Association Studies (GWAS) identifyproli�city-determining genes in sheep. Genet Mol Res. 2018;17(2).

14. Calvo JH, Chantepie L, Serrano M, Sarto MP, Iguacel LP, Jiménez MÁ, et al. A new allele in the BMP15gene (FecXRA) that affects proli�cacy co-segregates with FecXR and FecXGR in Rasa aragonesasheep. Theriogenology. 2020;144:107–11.

15. Esmaeili-Fard SM, Gholizadeh M, Hafezian SH, Abdollahi-Arpanahi R. Genome-wide associationstudy and pathway analysis identify NTRK2 as a novel candidate gene for litter size in sheep. PLoSOne. 2021;16(1):e0244408.

1�. Xia Q, Li Q, Gan S, Guo X, Zhang X, Zhang J, et al. Exploring the roles of fecundity-related long non-coding RNAs and mRNAs in the adrenal glands of small-tailed Han Sheep. BMC Genet. 2020;21:1–11.

17. Zeggini E, Ioannidis JPA. Meta-analysis in genome-wide association studies. 2009;

1�. Aulchenko YS, Ripke S, Isaacs A, Van Duijn CM. GenABEL: an R library for genome-wide associationanalysis. Bioinformatics. 2007;23(10):1294–6.

19. Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR, Bender D, et al. PLINK: a tool set forwhole-genome association and population-based linkage analyses. Am J Hum Genet.2007;81(3):559–75.

20. Willer CJ, Li Y, Abecasis GR. METAL: fast and e�cient meta-analysis of genomewide associationscans. Bioinformatics. 2010;26(17):2190–1.

21. Pereira T V, Patsopoulos NA, Salanti G, Ioannidis JPA. Discovery properties of genome-wideassociation signals from cumulatively combined data sets. Am J Epidemiol. 2009;170(10):1197–206.

22. Yu G, Wang L-G, Han Y, He Q-Y. clusterPro�ler: an R package for comparing biological themes amonggene clusters. Omi a J Integr Biol. 2012;16(5):284–7.

23. Raudvere U, Kolberg L, Kuzmin I, Arak T, Adler P, Peterson H, et al. g: Pro�ler: a web server forfunctional enrichment analysis and conversions of gene lists (2019 update). Nucleic Acids Res.2019;47(W1):W191–8.

24. Wickham H. ggplot2: elegant graphics for data analysis. Springer; 2016.

25. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, et al. Cytoscape: a softwareenvironment for integrated models of biomolecular interaction networks. Genome Res.2003;13(11):2498–504.

2�. Doncheva NT, Morris JH, Gorodkin J, Jensen LJ. Cytoscape StringApp: network analysis andvisualization of proteomics data. J Proteome Res. 2018;18(2):623–32.

27. Huang DW, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: paths toward thecomprehensive functional analysis of large gene lists. Nucleic Acids Res. 2009;37(1):1–13.

Page 16: for litter size in sheep Meta-analysis of genome-wide ...

Page 16/20

2�. Huang DW, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists usingDAVID bioinformatics resources. Nat Protoc. 2009;4(1):44.

29. Szklarczyk D, Gable AL, Lyon D, Junge A, Wyder S, Huerta-Cepas J, et al. STRING v11: protein–proteinassociation networks with increased coverage, supporting functional discovery in genome-wideexperimental datasets. Nucleic Acids Res. 2019;47(D1):D607–13.

30. Bader GD, Hogue CW V. An automated method for �nding molecular complexes in large proteininteraction networks. BMC Bioinformatics [Internet]. 2003/01/13. 2003 Jan 13;4:2. Available from:https://pubmed.ncbi.nlm.nih.gov/12525261

31. Chin C-H, Chen S-H, Wu H-H, Ho C-W, Ko M-T, Lin C-Y. cytoHubba: identifying hub objects and sub-networks from complex interactome. BMC Syst Biol. 2014;8(4):1–7.

32. Rajkovic A, Pangas SA, Matzuk MM. Follicular development: mouse, sheep, and human models.Knobil Neill’s Physiol Reprod. 2006;1:383–424.

33. Sargolzaei M, Schenkel FS. QMSim: a large-scale genome simulator for livestock. Bioinformatics.2009;25(5):680–1.

34. Seminara D, Khoury MJ, O’brien TR, Manolio T, Gwinn ML, Little J, et al. The Emergence of Networksin Human Genome Epidemiology:" Challenges and Opportunities". Epidemiology. 2007;18(1):1–8.

35. Panagiotou OA, Evangelou E, Ioannidis JPA. Genome-wide signi�cant associations for variants withminor allele frequency of 5% or less—an overview: a HuGE review. Am J Epidemiol. 2010;172(8):869–89.

3�. Regan SL, McFarlane JR, O’Shea T, Andronicos N, Arfuso F, Dharmarajan A, et al. Flow cytometricanalysis of FSHR, BMRR1B, LHR and apoptosis in granulosa cells and ovulation rate in merinosheep. Reproduction. 2015;150(2):151–63.

37. Hu X, Pokharel K, Peippo J, Ghanem N, Zhaboyev I, Kantanen J, et al. Identi�cation andcharacterization of mi RNA s in the ovaries of a highly proli�c sheep breed. Anim Genet.2016;47(2):234–9.

3�. Sayasith K, Sirois J, Lussier JG. Expression and regulation of regulator of G-protein signaling protein-2 (RGS2) in equine and bovine follicles prior to ovulation: molecular characterization of RGS2transactivation in bovine granulosa cells. Biol Reprod. 2014;91(6):131–9.

39. Sham PC, Purcell SM. Statistical power and signi�cance testing in large-scale genetic studies. NatRev Genet. 2014;15(5):335–46.

40. Zhu X, Stephens M. Large-scale genome-wide enrichment analyses identify new trait-associatedgenes and pathways across 31 human phenotypes. Nat Commun. 2018;9(1):1–14.

41. Esmaeili-Fard M, Hafezian SH, Gholizadeh M, Arpanahi RA. Gene set enrichment analysis usinggenome-wide association study to identify genes and biological pathways associated with twinningin Baluchi sheep. Anim Prod Res. 2019;8(2).

42. Hernández-Montiel W, Martínez-Núñez MA, Ramón-Ugalde JP, Román-Ponce SI, Calderón-Chagoya R,Zamora-Bustillos R. Genome-Wide Association Study Reveals Candidate Genes for Litter Size Traitsin Pelibuey Sheep. Animals. 2020;10(3):434.

Page 17: for litter size in sheep Meta-analysis of genome-wide ...

Page 17/20

43. Makker A, Goel MM, Mahdi AA. PI3K/PTEN/Akt and TSC/mTOR signaling pathways, ovariandysfunction, and infertility: an update. J Mol Endocrinol. 2014;53(3):R103–18.

44. Adhikari D, Gorre N, Risal S, Zhao Z, Zhang H, Shen Y, et al. The safe use of a PTEN inhibitor for theactivation of dormant mouse primordial follicles and generation of fertilizable eggs. PLoS One.2012;7(6):e39034.

45. Tosti E. Calcium ion currents mediating oocyte maturation events. Reprod Biol Endocrinol.2006;4(1):1–9.

4�. Homa ST. Calcium and meiotic maturation of the mammalian oocyte. Mol Reprod Dev.1995;40(1):122–34.

47. Vuguin PM, Kedees MH, Cui L, Guz Y, Gelling RW, Nejathaim M, et al. Ablation of the glucagonreceptor gene increases fetal lethality and produces alterations in islet development and maturation.Endocrinology. 2006;147(9):3995–4006.

4�. Ouhilal S, Vuguin P, Cui L, Du X-Q, Gelling RW, Reznik SE, et al. Hypoglycemia, hyperglucagonemia,and fetoplacental defects in glucagon receptor knockout mice: a role for glucagon action inpregnancy maintenance. Am J Physiol Metab. 2012;302(5):E522–31.

49. Tosca L, Chabrolle C, Dupont J. L’AMPK: un lien entre métabolisme et reproduction?médecine/sciences. 2008;24(3):297–300.

50. Tosca L, Chabrolle C, Uzbekova S, Dupont J. Effects of metformin on bovine granulosa cellssteroidogenesis: possible involvement of adenosine 5′ monophosphate-activated protein kinase(AMPK). Biol Reprod. 2007;76(3):368–78.

51. Tosca L, Crochet S, Ferré P, Foufelle F, Tesseraud S, Dupont J. AMP-activated protein kinaseactivation modulates progesterone secretion in granulosa cells from hen preovulatory follicles. JEndocrinol. 2006;190(1):85–97.

52. Tosca L, Froment P, Solnais P, Ferré P, Foufelle F, Dupont J. Adenosine 5′-monophosphate-activatedprotein kinase regulates progesterone secretion in rat granulosa cells. Endocrinology.2005;146(10):4500–13.

53. Vrtačnik P, Ostanek B, Mencej-Bedrač S, Marc J. The many faces of estrogen signaling. Biochemmedica. 2014;24(3):329–42.

54. Mendoza N, Castro JER, Sánchez Borrego R. A multigenic combination of estrogen related genes areassociated with the duration of fertility period in the Spanish population. Gynecol Endocrinol.2013;29(3):235–7.

55. Shimada M, Umehara T, Hoshino Y. Roles of epidermal growth factor (EGF)-like factor in theovulation process. Reprod Med Biol. 2016;15(4):201–16.

5�. Atasoy D, Schoch S, Ho A, Nadasy KA, Liu X, Zhang W, et al. Deletion of CASK in mice is lethal andimpairs synaptic function. Proc Natl Acad Sci. 2007;104(7):2525–30.

57. Ojeh N, Pekovic V, Jahoda C, Määttä A. The MAGUK-family protein CASK is targeted to nuclei of thebasal epidermis and controls keratinocyte proliferation. J Cell Sci. 2008;121(16):2705–17.

Page 18: for litter size in sheep Meta-analysis of genome-wide ...

Page 18/20

5�. Oliver KF, Wahl AM, Dick M, Toenges JA, Kiser JN, Galliou JM, et al. Genomic Analysis ofSpontaneous Abortion in Holstein Heifers and Primiparous Cows. Genes (Basel). 2019;10(12):954.

59. Rebecchi MJ, Pentyala SN. Structure, function, and control of phosphoinositide-speci�cphospholipase C. Physiol Rev. 2000;80(4):1291–335.

�0. Clouthier DE, Passos-Bueno MR, Tavares ALP, Lyonnet S, Amiel J, Gordon CT. Understanding thebasis of auriculocondylar syndrome: Insights from human, mouse and zebra�sh genetic studies. In:American Journal of Medical Genetics Part C: Seminars in Medical Genetics. Wiley Online Library;2013. p. 306–17.

�1. Kim D-H, Sarbassov DD, Ali SM, King JE, Latek RR, Erdjument-Bromage H, et al. mTOR interacts withraptor to form a nutrient-sensitive complex that signals to the cell growth machinery. Cell.2002;110(2):163–75.

Figures

Figure 1

Figure 1. (A) Manhattan plot. X-axis: SNPs positions on chromosomes, Y-axis: -Log10 P-value. The reddashed lines indicate the threshold for genome- and chromosome-wide signi�cance. (B) QQ plot for themeta-analysis.

Page 19: for litter size in sheep Meta-analysis of genome-wide ...

Page 19/20

Figure 2

Figure 2. Top GO terms and KEGG pathway signi�cantly (p ≤ 0.05) enriched with genes associated withaverage litter size in meta-analysis. A full list of associated terms and pathways is provided in Additional�le1: sheet 1 and sheet 2.

Page 20: for litter size in sheep Meta-analysis of genome-wide ...

Page 20/20

Figure 3

Figure 3. PPI network based on the identi�ed hub genes using cytoHubba plugin. (A) Network constructedby 10 hub genes and �rst-stage nodes. (B) Network constructed by 15 hub genes. Red dashed circlesshow the most signi�cant module identi�ed by MCODE. The color depth of the nodes was �lledaccording to the degree of relatedness.

Supplementary Files

This is a list of supplementary �les associated with this preprint. Click to download.

SupplementaryFile1.xlsx

formula.docx