Copy Number Variation of HLA-DQA1 and APOBEC3A/3B ...Systemic sclerosis (SSc), also called...

7
880 The Journal of Rheumatology 2016; 43:5; doi:10.3899/jrheum.150945 Personal non-commercial use only. The Journal of Rheumatology Copyright © 2016. All rights reserved. Copy Number Variation of HLA-DQA1 and APOBEC3A/3B Contribute to the Susceptibility of Systemic Sclerosis in the Chinese Han Population Shicheng Guo, Yuan Li, Yi Wang, Haiyan Chu, Yulin Chen, Qingmei Liu, Gang Guo, Wenzhen Tu, Wenyu Wu, Hejian Zou, Li Yang, Rong Xiao, Yanyun Ma, Feng Zhang, Momiao Xiong, Li Jin, Xiaodong Zhou, and Jiucun Wang ABSTRACT. Objective. Systemic sclerosis (SSc) is a systemic connective tissue disease caused by a genetic aberrant. The involvement of the copy number variations (CNV) in the pathogenesis of SSc is unclear. We tried to identify some CNV that are involved with the susceptibility to SSc. Methods. A genome-wide CNV screening was performed in 20 patients with SSc. Five SSc-associated common CNV that included HLA-DRB5, HLA-DQA1, IRGM, CDC42EP3, and APOBEC3A/3B were identified from the screening and were then validated in 365 patients with SSc and 369 matched healthy controls. Results. Three hundred forty-four CNV (140 gains and 204 losses) and 2 CNV hotspots (6q21.3 and 22q11.2) were found in the SSc genomes (covering 24.2 megabases), suggesting that CNV were ubiquitous in the SSc genome and played important roles in the pathogenesis of SSc. The high copy number of HLA-DQA1 was a significantly protective factor for SSc (OR 0.07, p = 2.99 × 10 –17 ), while the high copy number of APOBEC3A/B was a significant risk factor (OR 3.45, p = 6.4 × 10 –18 ), adjusted with sex and age. The risk prediction model based on genetic factors in logistic regression showed moderate prediction ability, with area under the curve = 0.80 (95% CI 0.77–0.83), which demonstrated that APOBEC3A/B and HLA-DQA1 were powerful biomarkers for SSc risk evaluation and contributed to the susceptibility to SSc. Conclusion. CNV of HLA-DQA1 and APOBEC3A/B contribute to the susceptibility to SSc in a Chinese Han population. (First Release April 1 2016; J Rheumatol 2016;43:880–6; doi:10.3899/ jrheum.150945) Key Indexing Terms: SYSTEMIC SCLEROSIS GENETIC PREDISPOSITION TO DISEASE HLA ANTIGENS COPY NUMBER VARIATION CASE-CONTROL STUDIES From the State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University; Institute of Rheumatology, Immunology and Allergy, Fudan University; Shanghai Traditional Chinese Medicine (TCM)-Integrated Hospital; Division of Dermatology, and Division of Rheumatology, Huashan Hospital, Fudan University, Shanghai; Yiling Hospital, Shijiazhuang; Division of Rheumatology, Teaching Hospital of Chengdu University of TCM, Chengdu; Department of Dermatology, Second Xiangya Hospital, Central South University, Changsha, China; School of Public Health, and Medical School at Houston, University of Texas, Houston, Texas, USA. Supported by research grants from the National Basic Research Program (2012CB944604), National Science Foundation of China (81270120, 81470254), International S&T Cooperation Program of China (2013DFA30870), the 111 Project (B13016), and the US National Institutes of Health (NIH) NIAID UO1, 1U01AI09090. The computations involved in this study were supported by Fudan University High-End Computing Center. Mr. Xiong was supported by Grant 1R01AR057120–01 and 1R01HL106034-01, from the NIH and the US National Heart, Lung, and Blood Institute. S. Guo, PhD, State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University, and Institute of Rheumatology, Immunology and Allergy, Fudan University; Y. Li, MS, State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University; Y. Wang, PhD, State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University; H. Chu, PhD, State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University; Y. Chen, PhD, State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University; Q. Liu, PhD, State Key Laboratory of Genetic Engineering and Ministry of Education Key Laboratory of Contemporary Anthropology, Collaborative Innovation Center for Genetics and Development, School of Life Sciences and Institutes of Biomedical Sciences, Fudan University; G. Guo, MD, Yiling Hospital; W. Tu, MD, Shanghai TCM-Integrated Hospital; W. Wu, MD, Institute of Rheumatology, Immunology and Allergy, Fudan University, and Division of Dermatology, Huashan Hospital, Fudan University; H. Zou, PhD, MD, Institute of Rheumatology, Immunology and Allergy, Fudan University, and Division of Rheumatology, Huashan Hospital, Fudan University; L. Yang, MD, Division of Rheumatology, Teaching www.jrheum.org Downloaded on May 20, 2021 from

Transcript of Copy Number Variation of HLA-DQA1 and APOBEC3A/3B ...Systemic sclerosis (SSc), also called...

Page 1: Copy Number Variation of HLA-DQA1 and APOBEC3A/3B ...Systemic sclerosis (SSc), also called scleroderma, is an immune-mediated disease characterized by extensive fibrosis of the skin

880 The Journal of Rheumatology 2016; 43:5; doi:10.3899/jrheum.150945

Personal non-commercial use only. The Journal of Rheumatology Copyright © 2016. All rights reserved.

Copy Number Variation of HLA-DQA1 andAPOBEC3A/3B Contribute to the Susceptibility ofSystemic Sclerosis in the Chinese Han PopulationShicheng Guo, Yuan Li, Yi Wang, Haiyan Chu, Yulin Chen, Qingmei Liu, Gang Guo, Wenzhen Tu, Wenyu Wu, Hejian Zou, Li Yang, Rong Xiao, Yanyun Ma, Feng Zhang, Momiao Xiong, Li Jin, Xiaodong Zhou, and Jiucun Wang

ABSTRACT. Objective. Systemic sclerosis (SSc) is a systemic connective tissue disease caused by a geneticaberrant. The involvement of the copy number variations (CNV) in the pathogenesis of SSc is unclear.We tried to identify some CNV that are involved with the susceptibility to SSc.Methods.A genome-wide CNV screening was performed in 20 patients with SSc. Five SSc-associatedcommon CNV that included HLA-DRB5, HLA-DQA1, IRGM, CDC42EP3, and APOBEC3A/3Bwereidentified from the screening and were then validated in 365 patients with SSc and 369 matchedhealthy controls.Results. Three hundred forty-four CNV (140 gains and 204 losses) and 2 CNV hotspots (6q21.3 and22q11.2) were found in the SSc genomes (covering 24.2 megabases), suggesting that CNV wereubiquitous in the SSc genome and played important roles in the pathogenesis of SSc. The high copynumber of HLA-DQA1was a significantly protective factor for SSc (OR 0.07, p = 2.99 × 10–17), whilethe high copy number of APOBEC3A/B was a significant risk factor (OR 3.45, p = 6.4 × 10–18),adjusted with sex and age. The risk prediction model based on genetic factors in logistic regressionshowed moderate prediction ability, with area under the curve = 0.80 (95% CI 0.77–0.83), whichdemonstrated that APOBEC3A/B and HLA-DQA1 were powerful biomarkers for SSc risk evaluationand contributed to the susceptibility to SSc.Conclusion. CNV of HLA-DQA1 and APOBEC3A/B contribute to the susceptibility to SSc in aChinese Han population. (First Release April 1 2016; J Rheumatol 2016;43:880–6; doi:10.3899/jrheum.150945)

Key Indexing Terms:SYSTEMIC SCLEROSIS GENETIC PREDISPOSITION TO DISEASE HLA ANTIGENSCOPY NUMBER VARIATION CASE-CONTROL STUDIES

From the State Key Laboratory of Genetic Engineering and Ministry ofEducation Key Laboratory of Contemporary Anthropology, CollaborativeInnovation Center for Genetics and Development, School of Life Sciencesand Institutes of Biomedical Sciences, Fudan University; Institute ofRheumatology, Immunology and Allergy, Fudan University; ShanghaiTraditional Chinese Medicine (TCM)-Integrated Hospital; Division ofDermatology, and Division of Rheumatology, Huashan Hospital, FudanUniversity, Shanghai; Yiling Hospital, Shijiazhuang; Division ofRheumatology, Teaching Hospital of Chengdu University of TCM,Chengdu; Department of Dermatology, Second Xiangya Hospital, CentralSouth University, Changsha, China; School of Public Health, and MedicalSchool at Houston, University of Texas, Houston, Texas, USA.Supported by research grants from the National Basic Research Program(2012CB944604), National Science Foundation of China (81270120,81470254), International S&T Cooperation Program of China(2013DFA30870), the 111 Project (B13016), and the US National Institutesof Health (NIH) NIAID UO1, 1U01AI09090. The computations involved inthis study were supported by Fudan University High-End ComputingCenter. Mr. Xiong was supported by Grant 1R01AR057120–01 and1R01HL106034-01, from the NIH and the US National Heart, Lung, andBlood Institute.S. Guo, PhD, State Key Laboratory of Genetic Engineering and Ministryof Education Key Laboratory of Contemporary Anthropology,Collaborative Innovation Center for Genetics and Development, School ofLife Sciences and Institutes of Biomedical Sciences, Fudan University, andInstitute of Rheumatology, Immunology and Allergy, Fudan University;

Y. Li, MS, State Key Laboratory of Genetic Engineering and Ministry ofEducation Key Laboratory of Contemporary Anthropology, CollaborativeInnovation Center for Genetics and Development, School of Life Sciencesand Institutes of Biomedical Sciences, Fudan University; Y. Wang, PhD,State Key Laboratory of Genetic Engineering and Ministry of EducationKey Laboratory of Contemporary Anthropology, Collaborative InnovationCenter for Genetics and Development, School of Life Sciences andInstitutes of Biomedical Sciences, Fudan University; H. Chu, PhD, StateKey Laboratory of Genetic Engineering and Ministry of Education KeyLaboratory of Contemporary Anthropology, Collaborative InnovationCenter for Genetics and Development, School of Life Sciences andInstitutes of Biomedical Sciences, Fudan University; Y. Chen, PhD, StateKey Laboratory of Genetic Engineering and Ministry of Education KeyLaboratory of Contemporary Anthropology, Collaborative InnovationCenter for Genetics and Development, School of Life Sciences andInstitutes of Biomedical Sciences, Fudan University; Q. Liu, PhD, StateKey Laboratory of Genetic Engineering and Ministry of Education KeyLaboratory of Contemporary Anthropology, Collaborative InnovationCenter for Genetics and Development, School of Life Sciences andInstitutes of Biomedical Sciences, Fudan University; G. Guo, MD, YilingHospital; W. Tu, MD, Shanghai TCM-Integrated Hospital; W. Wu, MD,Institute of Rheumatology, Immunology and Allergy, Fudan University,and Division of Dermatology, Huashan Hospital, Fudan University; H. Zou, PhD, MD, Institute of Rheumatology, Immunology and Allergy,Fudan University, and Division of Rheumatology, Huashan Hospital,Fudan University; L. Yang, MD, Division of Rheumatology, Teaching

www.jrheum.orgDownloaded on May 20, 2021 from

Page 2: Copy Number Variation of HLA-DQA1 and APOBEC3A/3B ...Systemic sclerosis (SSc), also called scleroderma, is an immune-mediated disease characterized by extensive fibrosis of the skin

Hospital of Chengdu University of TCM; R. Xiao, MD, Department ofDermatology, Second Xiangya Hospital, Central South University; Y. Ma,MS, State Key Laboratory of Genetic Engineering and Ministry ofEducation Key Laboratory of Contemporary Anthropology, CollaborativeInnovation Center for Genetics and Development, School of Life Sciencesand Institutes of Biomedical Sciences, Fudan University; F. Zhang, PhD,State Key Laboratory of Genetic Engineering and Ministry of EducationKey Laboratory of Contemporary Anthropology, Collaborative InnovationCenter for Genetics and Development, School of Life Sciences andInstitutes of Biomedical Sciences, Fudan University; M. Xiong, PhD,School of Public Health, University of Texas; L. Jin, PhD, State KeyLaboratory of Genetic Engineering and Ministry of Education KeyLaboratory of Contemporary Anthropology, Collaborative InnovationCenter for Genetics and Development, School of Life Sciences andInstitutes of Biomedical Sciences, Fudan University; X. Zhou, MD,Medical School at Houston, University of Texas; J. Wang, PhD, State KeyLaboratory of Genetic Engineering and Ministry of Education KeyLaboratory of Contemporary Anthropology, Collaborative InnovationCenter for Genetics and Development, School of Life Sciences andInstitutes of Biomedical Sciences, Fudan University, and Institute ofRheumatology, Immunology and Allergy, Fudan University.Address correspondence to J. Wang, School of Life Sciences, FudanUniversity Jiangwan Campus, 2005 Songhu Road, Shanghai 200438,China. E-mail: [email protected] for publication January 26, 2016.

Systemic sclerosis (SSc), also called scleroderma, is animmune-mediated disease characterized by extensive fibrosisof the skin and associated with various degrees of chronicinflammatory infiltration and significant microangiopathy,and changes in humoral or cellular immune system1,2.According to the degree and extent of fibrosis, there are 2major clinical subtypes: limited cutaneous SSc (lcSSc) anddiffuse cutaneous SSc (dcSSc).

Large numbers of the epidemiological characteristics weresignificantly different between the populations with differentgenetic architecture. For example, the incidence of SScranged from 3.7–23 per 100,000 population in differentethnic groups3,4; the prevalence of SSc for women is 4-foldto 5-fold higher than that for the male population5,6. Arnett,et al7 and Mayes, et al5 found that siblings had about a15-fold higher risk of SSc, while first-degree relatives hadabout a 13-fold higher risk of SSc. In addition, analysis ofSSc in twins reveals low concordance for the disease.However, high concordance for the presence of antinuclearantibodies (ANA) was observed8. This evidence shows thatSSc is a complex disease caused by specific genetic andgenomic variants9. Several genome-wide associationfollowup studies10,11,12,13 and case-control studies haveshown multiple susceptible single-nucleotide polymorphisms(SNP) associated with susceptibility to SSc, such asPPARG13, IRF514, transforming growth factor-β receptor15,TNIP116, STAT417, RHOB, and more. However, we demon-strated previously that even for high familial risk diseasessuch as thyroid cancer, a few of the significant SNP couldjust have limited prediction power18. Some other genetic orepigenetic variation should be discovered, such as copynumber variation (CNV)19. CNV is one of most importantsources of genetic structural diversity in the human genome;it can cause gene structure and accordingly gene expression

change. Evidence showed that at least 8.75% to 17.7% of thevariation in gene expression could be explained by CNV20.Some CNV have been demonstrated to be widely associatedwith susceptibility to immune-mediated diseases such asankylosing spondylitis (AS)21, rheumatoid arthritis22,23,24,systemic lupus erythematosus25,26, and others. Someevidence has demonstrated that specific CNV wereassociated with SSc27.

In our present study, we analyzed potential SSc-associatedCNV with Agilent array comparative genomic hybridization(aCGH) and AccuCopy CNV genotyping technologies28.Genome-wide aCGH microarrays were conducted to detectCNV in 20 patients with SSc, and then 5 candidate CNV thatwere suspected of being related to the pathology of SScincluding HLA-DRB5, HLA-DQA1, IRGM, CDC42EP3, andAPOBEC3A/3B. These CNV were validated in a largeChinese Han population.

MATERIALS AND METHODSPatients and controls. SSc mainly includes 2 subtypes: lcSSc and dcSSc. Inthe discovery stage, we covered both of them in 5 male and 5 female samplesso that we could get some unbiased candidate CNV associated with SSc andadjusted for sex and subtype of SSc in a Chinese Han population (4categories with sex and subtypes) for the aCGH array screening. In thevalidation stage, a total of 734 subjects were enrolled, including 365 caseswith SSc and 369 ethnically matched healthy controls. All patients wererecruited from a multicenter study that included hospitals and outpatientclinics in Shanghai, Hebei province, Sichuan province, and Hunan provincein China. Patients either met the American College of Rheumatology classi-fication criteria for SSc29 or had at least 3 out of 5 CREST features (calci-nosis, Raynaud phenomenon, esophageal dysmotility, sclerodactyly, andtelangiectasia) with sclerodactyly being mandatory30.DNA extraction, autoantibody test, and organ involvement assessment.Peripheral blood was collected from all subjects. Genomic DNA was isolatedfrom whole blood and stored at –30°C until used in our previous study17.The following autoantibodies were detected in patient sera: ANA, anti-DNAtopoisomerase I (ATA), anticentromere (ACA), anti-U1RNP (ARA),anti-RNA polymerase 3 (anti-RNAP 3), anti-Sm, anti-SSA, anti-SSB,anti-PM-1, anti-Jo1 antibodies, and rheumatoid factor. The status of theautoantibodies was measured as binary data: positive and negative, exceptanti-RNAP 3, which had continuous data. In addition, autoantibody detectiondetails could be found in our previous work6. Pulmonary fibrosis wasassessed with radiograph and/or computed tomography. Organ involvementwas defined as Steen and Medsger suggested31.Genome-wide CNV analysis. As in our previous study21, Agilent SurePrintG3 Human CGH 1×1M Oligo Microarray was performed for genome-wideCNV detection and genotyping. DNA was isolated from 20 patients withSSc. For each sample, 2.2 µg input genomic DNA was restriction-digestedand labeled with ULS-Cy5 and ULS-Cy3 in accordance with the manufac-turer’s protocol (Agilent 2010). The labeled product was then hybridized tothe array and scanned on the Agilent Microarray Scanner. The data wereextracted by Agilent Feature Extraction 10.7.3.1 and analyzed by AgilentWorkbench 7.0 with default variables. A human genome coordinate wasconversed in hg19 uniformly in our present study.AccuCopy technology for CNV validation. Five common CNV resultingfrom aCGH were validated with the AccuCopy assay28 (a multiple compet-itive real-time PCR) by Genesky Bio-Tech. Briefly, the genomic DNA ofeach subject was mixed with fluorescence-labeled specific primers(Supplementary Table 1, available from the authors on request), PCR MasterMix, and a competitive DNA with known copy number for a multiple

881Guo, et al: CNV susceptibility to SSc

Personal non-commercial use only. The Journal of Rheumatology Copyright © 2016. All rights reserved.

www.jrheum.orgDownloaded on May 20, 2021 from

Page 3: Copy Number Variation of HLA-DQA1 and APOBEC3A/3B ...Systemic sclerosis (SSc), also called scleroderma, is an immune-mediated disease characterized by extensive fibrosis of the skin

competitive real-time PCR reaction. The PCR products were diluted andloaded onto an ABI 3730XL sequencer for quantification analysis. Raw datawere analyzed by Gene Mapper 4.0, and Hg19 was used for the genomebuild for the genomic coordinates. The peak ratio between sample DNA andthe corresponding competitive DNA (S/C) was calculated and thennormalized to the median of the 4 preset 2-copy reference genes, respec-tively. Two normalized S/C ratios were further normalized to the medianvalue in all samples for each reference gene and then averaged. The copynumber of each target fragment was determined by the average S/C ratio ×2. Cases and controls were examined and read at the same time to minimizenonrandom errors.Statistical analysis. In the validation stage, binary logistic regression wasapplied to discover association between CNV and SSc, and adjusted withsex and age to estimate the marginal effects of candidate CNV (marginaleffect model). The partial effect for a specific CNV was also estimated andadjusted with other remaining CNV, sex, and age (partial effect model). Inthe partial effect model, for each of the 5 CNV selected for validation, theremaining 4 were adjusted so that the partial effect for each CNV could beestimated simultaneously. OR and 95% CI were calculated with the R code.Association between CNV and organ involvement was conducted withbinary logistic model, while association between CNV and anti-RNAP 3(continuous variable) was conducted with linear regression models. Theindividual with missing age, sex, or some other information would beomitted in the regression model. Chi-square or the Fisher’s exact test was

applied for an independent test between autoantibody and organinvolvement. The effect size of the CNV in subgroup analyses wasconducted to certain samples with specific clinical characteristics comparedwith normal. R packages32 “PredictABEL”33 and “pROC”34 were appliedfor the receiver-operating characteristic (ROC) plot.

RESULTSCNV discovery in patients with SSc. Genome-wide CNVanalysis was conducted in 20 patients with SSc with aCGHarray. The demographic characteristics of the 20 samples areshown in Supplementary Table 2 (available from the authorson request). There were 344 CNV (average 57.3, SD 8.3)found in the 20 individuals, including 140 gains and 204losses, and they covered a 24.2-megabase of the wholegenome (Figure 1). CNV hotspot was observed in the 6q21.3region, which indicated that CNV in the HLA region mightbe associated with SSc. In the 22q11.2 region, a CNV hotspotwas also detected with a large amount of CNV loss. Afterfiltering the common CNV and our previously reportedChinese common CNV (1440 CNV regions)35, there were atotal of 31 CNV remaining (8 gains, 23 loss; average = 5, SD2.9), some of which may be the causal/risk CNV for SSc.There was no significant difference in the length of thecommon and novel CNV (Wilcoxon test, p = 0.26). Therewere 23 genes and correspondingly 119 RNA involved with31 novel CNV (Supplementary Table 3, available from theauthors on request). There was no significant differencebetween the loss/gain ratio between common and new CNV(chi-square test, p = 0.11). In our SSc CNV map, the numberof deletion counts was much higher than that of duplicationcounts (Student t test, p < 0.004), while the size of duplicationwas much larger than that of deletion (Student t test, p =0.03). These results suggested that common or novel CNVmay be ubiquitous in the SSc genome and be involved in thepathogenesis of SSc.Association between CNV of APOBEC3A/3B, HLA-DQA1,and SSc susceptibility. To confirm what we found in thediscovery stage, 5 common CNV regions (HLA-DRB5,HLA-DQA1, IRGM, CDC42EP3, and APOBEC3A/3B),which were found in at least 2 of the above samples, werevalidated in the validation stage. A total of 365 patients with

882 The Journal of Rheumatology 2016; 43:5; doi:10.3899/jrheum.150945

Personal non-commercial use only. The Journal of Rheumatology Copyright © 2016. All rights reserved.

Table 1. Proportion of antibody and organ involvement. Values are n (%)unless otherwise specified.

Variables Positive Negative Total, n

AntibodiesANA 278 (76.2) 18 (4.9) 296ATA 164 (44.9) 151 (41.4) 315ACA 32 (8.8) 227 (62.2) 259ARA 45 (12.3) 251 (68.8) 296Anti-Sm 11 (3) 250 (68.5) 261Anti-SSA 84 (23) 181 (49.6) 265Anti-SSB 22 (6) 239 (65.5) 261Anti-PM-1 7 (1.9) 163 (44.7) 170Anti-Jo1 9 (2.5) 241 (66) 250RF 44 (12.1) 139 (38.1) 183

Organ involvementLung 77 (21.1) 211 (57.8) 288Heart 194 (53.2) 86 (23.6) 280

ANA: antinuclear; ATA: anti-DNA topoisomerase I; ACA: anticentromere;ARA: anti-U1RNP; RF: rheumatoid factor.

Table 2.Association between serological profile and clinical characteristics in SSc. Chi-square or 1-way ANOVA was conducted to make association analysisbetween categorical variables or categorical continuous variables. P value < 0.05 was considered a significant association. Corresponding OR and 95% CI areshown in Supplementary Table 5 (available from the authors on request).

Characteristics ANA ATA ACA Anti-RNAP 3 ARA Anti-Sm SSA SSB Anti-PM-1 RF

Sex 0.3223 0.6592 0.1294 0.8778 0.003 0.1394 0.8701 0.5802 0.6897 0.5517lcSSc/dcSSc 1 0.025 0.0165 0.0902 0.0105 0.2184 0.061 0.017 0.0585 0.1714CREST 1 1 1 0.0198 0.0145 0.2599 0.2024 1 0.1979 1Lung 0.7411 0.0205 0.026 0.8646 0.4398 0.1709 0.8726 0.08 1 0.3983Heart 0.5132 1 0.8106 0.2396 0.7251 0.7676 0.3678 0.4733 1 0.8406

SSc: systemic sclerosis; ANA: antinuclear antibodies; ATA: anti-DNA topoisomerase I antibody; ACA: anticentromere antobodies; anti-RNAP 3: anti-RNApolymerase 3 antibodies; ARA: anti-U1RNP; RF: rheumatoid factor; lcSSc: limited cutaneous SSc; dcSSc: diffuse cutaneous SSc; CREST (syndrome): calci-nosis, Raynaud phenomenon, esophageal dysmotility, sclerodactyly, and telangiectasias.

www.jrheum.orgDownloaded on May 20, 2021 from

Page 4: Copy Number Variation of HLA-DQA1 and APOBEC3A/3B ...Systemic sclerosis (SSc), also called scleroderma, is an immune-mediated disease characterized by extensive fibrosis of the skin

883Guo, et al: CNV susceptibility to SSc

Figure 1. Genome-wide aCGH analysis reveals copynumber variation (CNV) in the SSc population.Significant SSc-associated CNV identified by aCGHarray were noted in the human genome. Red bars repre-sents high copy number in SSc, while green bars representlow copy number in SSc. aCGH: array comparativegenomic hybridization; SSc: systemic sclerosis.

Figure 2. Performance of 4 prediction models based onthe ROC curve. Model A included sex, age, and theseCNV: HLA-DRB5, HLA-DQA1, IRGM, CDC42EP3, andAPOBEC3A/3B. Model B included the sex, age, and copynumber of HLA-DQA1 and APOBEC3A/3B, which wereselected with a forward conditional stepwise method fromall the covariates. Model C included sex and copy numberof HLA-DQA1 and APOBEC3A/3B. Model D included 2SSc-susceptibility genetic factors and the copy number ofHLA-DQA1 and APOBEC3A/3B. ROC: receiver-operationcharacteristic; CNV: copy number variations; SSc:systemic sclerosis; AUC: area under the curve.

Personal non-commercial use only. The Journal of Rheumatology Copyright © 2016. All rights reserved.

www.jrheum.orgDownloaded on May 20, 2021 from

Page 5: Copy Number Variation of HLA-DQA1 and APOBEC3A/3B ...Systemic sclerosis (SSc), also called scleroderma, is an immune-mediated disease characterized by extensive fibrosis of the skin

SSc and 369 ethnically matched healthy individuals wereenrolled in our study. There were no significant differencesfor the proportion of men/women in cases and controls(Supplementary Table 4, available from the authors onrequest). The patients with SSc included 52% with lcSSc and41% with dcSSc, while the remaining 7% were not clearlysubtyped and therefore could not be assigned to either group.The autoantibody status is shown in Table 1. Four autoanti-bodies were observed to be significantly differentlydistributed between lcSSc and dcSSc subtypes: ACA (OR2.58, 95% CI 1.2–5.7, p = 0.024), ATA (OR 0.57, 95% CI0.35–0.91, p = 0.025), ARA (OR 2.37, 95% CI 1.22–4.62, p = 0.014), and anti-SSB (OR 0.27, 95% CI 0.09–0.83, p =0.016). The ARA-positive proportion in women was signifi-cantly higher than in men (OR 11.5, Fisher’s exact test, p =0.003). Associations among other clinical characteristics areshown in Table 2 and Supplementary Table 5 (available fromthe authors on request).

The copy numbers of 5 genomic regions located in or near HLA-DRB5, HLA-DQA1, IRGM, CDC42EP3, andAPOBEC3A/3Bwere examined in 365 patients with SSc and369 healthy individuals (Supplementary Table 3, availablefrom the authors on request). Among all the samples, CNVgenotyping rates were greater than 80%. Both marginal effectand partial effect from the logistic regressions showed thatCNV of HLA-DQA1 and APOBEC3A/B were significantlyassociated with the risk of SSc (Table 3). Low copy numberof HLA-DQA1 was a significantly protective factor for SSc(OR 0.07, p = 2.99 × 10–17) while high APOBEC3A/B was asignificant risk factor (OR 3.45, p = 6.4 × 10–18). Inconsistent

conclusions were found for HLA-DRB5 and CDC42EP3between the marginal and partial effect models. HighHLA-DRB5 was significantly associated with SSc in themarginal effect model (OR 2.54, p = 7.4 × 10–21), while highCDC42EP3was significantly associated with SSc only in thepartial effect model (OR 1.33, p = 0.02). Because no signifi-cant association was found between CNV of IRGM and SSc,subgroup univariate logistic regressions were conducted tovalidate whether CNV of IRGM was associated with certainsubtypes of SSc. Case individuals from the subtypes by thestatus of 10 autoimmune antibodies or organ involvementwere compared with controls using binary logistic regression.However, no significant association was found betweenIRGM and susceptibility to any subtype of SSc (Supple-mentary Tables 6–11, available from the authors on request).Risk prediction models established based on sex, age, and/orCNV. Compared with the OR to display the associationbetween genetic variants and a phenotype, disease riskprediction models are more clinically useful. Herein, wereport 4 prediction models of the absolute risk for patientswith SSc; these models were based on a logistic regressionmodel.

As Figure 2 shows, model A included sex, age, and all ofthe 5 identified CNV. Model B included the sex, age, andcopy number of HLA-DQA1 and APOBEC3A/3B, whichwere selected with the forward conditional stepwise methodfrom all the covariates. Model C included sex and copynumber of HLA-DQA1 and APOBEC3A/3B, because thesefactors remain unchanged. Model D features includedSSc-susceptibility genetic factor and copy number of

884 The Journal of Rheumatology 2016; 43:5; doi:10.3899/jrheum.150945

Personal non-commercial use only. The Journal of Rheumatology Copyright © 2016. All rights reserved.

Table 3.Association between CNV and SSc in marginal and partial logistic model.

Variables OR (95% CI)* p* OR (95% CI)** p**

HLA-DRB5 2.54 (2.09–3.08) 7.40 × 10–21 1.25 (0.98–1.59) 0.073HLA-DQA1 0.07 (0.03–0.12) 2.99 × 10–17 0.09 (0.04–0.18) 2.59 × 10–11Near IRGM 1.19 (0.95–1.49) 0.1319 1.25 (0.96–1.63) 0.093Near CDC42EP3 1.17 (0.96–1.44) 0.1209 1.33 (1.05–1.68) 0.02APOBEC3A/3B 3.4 (2.58–4.5) 6.40 × 10–18 1.6 (1.18–2.18) 0.002

* Values from partial association based on multivariate logistic regression. ** Values from marginal associationbased on univariate logistic regression. Both of these logistic regression tests were adjusted for age and sex. Neitherp value was adjusted by multiple comparison correction. Significant data are in bold face. CNV: copy numbervariations; SSc: systemic sclerosis.

Table 4. Distinguishing the ability of 4 models to predict risk of SSc. The outpoint corresponding to maximum ofthe sum of sensitivity and specificity was considered the threshold.

Model Sensitivity Specificity AUC (95% CI) Variate

Model A 0.89 0.60 0.81 (0.78–0.84) Sex, age, 5 CNVModel B 0.82 0.65 0.80 (0.77–0.83) Sex, age, HLA-DQA1, APOBEC3A/3BModel C 0.99 0.45 0.75 (0.72–0.79) Sex, HLA-DQA1, APOBEC3A/3BModel D 0.99 0.46 0.73 (0.69–0.76) HLA-DQA1, APOBEC3A/3B

SSc: systemic sclerosis; AUC: area under the curve; CNV: copy number variations.

www.jrheum.orgDownloaded on May 20, 2021 from

Page 6: Copy Number Variation of HLA-DQA1 and APOBEC3A/3B ...Systemic sclerosis (SSc), also called scleroderma, is an immune-mediated disease characterized by extensive fibrosis of the skin

885Guo, et al: CNV susceptibility to SSc

HLA-DQA1 and APOBEC3A/3B (Table 4). The ROC curveanalysis showed that HLA-DQA1 and APOBEC3A/3Bcombined with sex and age had moderate prediction power[area under the curve (AUC) = 0.80, 95% CI 0.77–0.83].Thus, the CNV of HLA-DQA1 and APOBEC3A/3Bwould bea potentially independent genetic susceptibility factor to SSc.

DISCUSSIONOur results provided evidence that the low copy number ofHLA-DQA1 (OR 0.07, p = 2.99 × 10–17) and high copynumber of APOBEC3A/3B (OR 3.45, p = 6.4 × 10–18) signifi-cantly contributed to the susceptibility to SSc in the ChineseHan population.

Common CNV represent an important source of geneticdiversity, yet their influence on phenotypic variability anddisease susceptibility remains poorly analyzed. We providedmore evidence about the involvement of common CNV oncomplex disease, especially on immune system-relateddisease. Our genome-wide CNV profile in SSc and normalsubjects was also supported by some previous studies, suchas one finding that 22q11.2 deletion may result in variableclinical phenotypes and that 22q11.2 deletion individuals areat increased risk of a variety of autoimmune diseases36. Inaddition, the CNV pattern in our present study were alsoobserved in our previous discovery in a healthy Chinesepopulation35.

The HLA-DQA1 gene is one of the HLA complex genesencoding a protein that presents specific antigen peptides toT cell receptor to initiate immune response. Genetic allelicvariation in HLA-DQA1 has been reported to be associatedwith SSc37. The association of CNV of HLA-DQA1with SScindicates dosage of HLA-DQA1 and influences the suscepti-bility to SSc. Additionally, CNV of HLA-DQA1 has beenconfirmed to be associated with many kinds of autoimmunediseases, such as AS21.APOBEC3 are a family of DNA-editing enzymes thought

to be part of the innate immune system by restricting retro-viruses, mobile genetic elements such as retrotransposons,and endogenous retroviruses. APOBEC3A is an importantepigenetic-related regulation factor38,39 that is highlyexpressed in monocytes and macrophages upon stimulationwith interferon. It can activate the DNA damage response andcause cell-cycle arrest40; APOBEC3A and APOBEC3B alsoare potent inhibitors of long terminal repeat retrotransposonfunction in human cells41. In such situations, the copynumber of APOBEC3A/3B will cause different reactions ofthe innate immune system and might have some effect on thepathogenesis of SSc.

In the prediction model section, we show the predictionperformance based on CNV and some other confoundersquantitatively. In addition, AUC could be used to comparewith other prediction models, which were established fromSNP or epidemiological factors. However, we also admit thatthe eventual prediction model might include all the true

explanatory variables such as SNP, CNV, and even someepigenetic variations in the clinical or epidemiological appli-cation in the population or hospital scenario. Moreover, theperfect prediction model should be applied 5-fold or 10-foldin cross-validation or independent dataset validation, whilethe sample size in our present study is limited. We would testand reevaluate our model in our future samples. We also couldnot detect all the significantly associated variations such asSNP, methylation, and others; thus our present study is prelim-inary, and we tried to identify some CNV variations associatedwith SSc. In the future, we could build more accurate andcredible prediction models for SSc risk evaluation.

More and more aberrant CNV were found in SSc, such asFCGR3B27; therefore, genome-wide association betweenCNV and SSc should be studied to identify more susceptibilityfactors to SSc. To our knowledge, ours is the first SSc CNVmap in the Chinese population, although the sample size islimited. More samples are being tested, to discover moreSSc-specific or -associated CNV. In addition, we designedseveral probes to detect the CNV in our regions of interest.However, some CNV might be located outside the targetregion, which would provide underestimates for the effect sizeof the CNV to SSc susceptibility. Also, CNV can be limitedto a single gene or include a contiguous set of genes. In thelatter situation, CNV will greatly influence the human genomeand so may be responsible for a substantial amount of humanphenotypic variability, complex behavioral traits, and diseasesusceptibility. Therefore, our present study provided importantevidence to identify more and more missing heritability forcomplex diseases based on common CNV.

Low copy of HLA-DQA1 and high copy of APOBEC3A/3Bregions are significantly associated with the susceptibility toSSc.

REFERENCES 1. Bunn CC, Black CM. Systemic sclerosis: an autoantibody mosaic.

Clin Exp Immunol 1999;117:207-8. 2. Tamby MC, Chanseaud Y, Guillevin L, Mouthon L. New insights

into the pathogenesis of systemic sclerosis. Autoimmun Rev2003;2:152-7.

3. Silman A, Jannini S, Symmons D, Bacon P. An epidemiologicalstudy of scleroderma in the West Midlands. Br J Rheumatol1988;27:286-90.

4. Arias-Nuñez MC, Llorca J, Vazquez-Rodriguez TR, Gomez-AceboI, Miranda-Filloy JA, Martin J, et al. Systemic sclerosis in northwestern Spain: a 19-year epidemiologic study. Medicine2008;87:272-80.

5. Mayes MD, Lacey JV Jr, Beebe-Dimmer J, Gillespie BW, CooperB, Laing TJ, et al. Prevalence, incidence, survival, and diseasecharacteristics of systemic sclerosis in a large US population.Arthritis Rheum 2003;48:2246-55.

6. Wang J, Assassi S, Guo G, Tu W, Wu W, Yang L, et al. Clinical andserological features of systemic sclerosis in a Chinese cohort. ClinRheumatol 2013;32:617-21.

7. Arnett FC, Cho M, Chatterjee S, Aguilar MB, Reveille JD, MayesMD. Familial occurrence frequencies and relative risks for systemicsclerosis (scleroderma) in three United States cohorts. ArthritisRheum 2001;44:1359-62.

Personal non-commercial use only. The Journal of Rheumatology Copyright © 2016. All rights reserved.

www.jrheum.orgDownloaded on May 20, 2021 from

Page 7: Copy Number Variation of HLA-DQA1 and APOBEC3A/3B ...Systemic sclerosis (SSc), also called scleroderma, is an immune-mediated disease characterized by extensive fibrosis of the skin

886 The Journal of Rheumatology 2016; 43:5; doi:10.3899/jrheum.150945

Personal non-commercial use only. The Journal of Rheumatology Copyright © 2016. All rights reserved.

8. Feghali-Bostwick C, Medsger TA Jr, Wright TM. Analysis ofsystemic sclerosis in twins reveals low concordance for disease andhigh concordance for the presence of antinuclear antibodies.Arthritis Rheum 2003;48:1956-63.

9. Jüngel A, Distler JH, Gay S, Distler O. Epigenetic modifications:novel therapeutic strategies for systemic sclerosis? Expert Rev ClinImmunol 2011;7:475-80.

10. Martin JE, Broen JC, Carmona FD, Teruel M, Simeon CP, VonkMC, et al. Identification of CSK as a systemic sclerosis genetic riskfactor through Genome Wide Association Study follow-up. HumMol Genet 2012;21:2825-35.

11. Gorlova O, Martin JE, Rueda B, Koeleman BP, Ying J, Teruel M, etal. Identification of novel genetic markers associated with clinicalphenotypes of systemic sclerosis through a genome-wide association strategy. PLoS Genet 2011;7:e1002178.

12. Allanore Y, Saad M, Dieude P, Avouac J, Distler JH, Amouyel P, etal. Genome-wide scan identifies TNIP1, PSORS1C1, and RHOB asnovel risk loci for systemic sclerosis. PLoS Genet 2011;7:e1002091.

13. López-Isac E, Bossini-Castillo L, Simeon CP, Egurbide MV, Alegre-Sancho JJ, Callejas JL, et al. A genome-wide associationstudy follow-up suggests a possible role for PPARG in systemicsclerosis susceptibility. Arthritis Res Ther 2014;16:R6.

14. Sharif R, Mayes MD, Tan FK, Gorlova OY, Hummers LK, ShahAA, et al. IRF5 polymorphism predicts prognosis in patients withsystemic sclerosis. Ann Rheum Dis 2012;71:1197-202.

15. Koumakis E, Wipff J, Dieude P, Ruiz B, Bouaziz M, Revillod L, etal. TGFβ receptor gene variants in systemic sclerosis-relatedpulmonary arterial hypertension: results from a multicentreEUSTAR study of European Caucasian patients. Ann Rheum Dis2012;71:1900-3.

16. Bossini-Castillo L, Martin JE, Broen J, Simeon CP, Beretta L,Gorlova OY, et al. Confirmation of TNIP1 but not RHOB andPSORS1C1 as systemic sclerosis risk factors in a large independentreplication study. Ann Rheum Dis 2013;72:602-7.

17. Yi L, Wang JC, Guo XJ, Gu YH, Tu WZ, Guo G, et al. STAT4 is agenetic risk factor for systemic sclerosis in a Chinese population. IntJ Immunopathol Pharmacol 2013;26:473-8.

18. Guo S, Wang YL, Li Y, Jin L, Xiong M, Ji QH, et al. SignificantSNPs have limited prediction ability for thyroid cancer. Cancer Med2014;3:731-5.

19. Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA,Hunter DJ, et al. Finding the missing heritability of complexdiseases. Nature 2009;461:747-53.

20. Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, ThorneN, et al. Relative impact of nucleotide and copy number variationon gene expression phenotypes. Science 2007;315:848-53.

21. Wang J, Yang Y, Guo S, Chen Y, Yang C, Ji H, et al. Associationbetween copy number variations of HLA-DQA1 and ankylosingspondylitis in the Chinese Han population. Genes Immun2013;14:500-3.

22. McKinney C, Fanciulli M, Merriman ME, Phipps-Green A,Alizadeh BZ, Koeleman BP, et al. Association of variation inFcgamma receptor 3B gene copy number with rheumatoid arthritisin Caucasian samples. Ann Rheum Dis 2010;69:1711-6.

23. Thabet MM, Huizinga TW, Marques RB, Stoeken-Rijsbergen G,Bakker AM, Kurreeman FA, et al. Contribution of Fcgammareceptor IIIA gene 158V/F polymorphism and copy numbervariation to the risk of ACPA-positive rheumatoid arthritis. AnnRheum Dis 2009;68:1775-80.

24. McKinney C, Merriman ME, Chapman PT, Gow PJ, Harrison AA,Highton J, et al. Evidence for an influence of chemokine ligand

3-like 1 (CCL3L1) gene copy number on susceptibility torheumatoid arthritis. Ann Rheum Dis 2008;67:409-13.

25. García-Ortiz H, Velázquez-Cruz R, Espinosa-Rosales F, Jiménez-Morales S, Baca V, Orozco L. Association of TLR7 copynumber variation with susceptibility to childhood-onset systemiclupus erythematosus in Mexican population. Ann Rheum Dis2010;69:1861-5.

26. Wu L, Guo S, Yang D, Ma Y, Ji H, Chen Y, et al. Copy numbervariations of HLA-DRB5 is associated with systemic lupus erythematosus risk in Chinese Han population. Acta BiochimBiophys Sin 2014;46:155-60.

27. McKinney C, Broen JC, Vonk MC, Beretta L, Hesselstrand R,Hunzelmann N, et al. Evidence that deletion at FCGR3B is a riskfactor for systemic sclerosis. Genes Immun 2012;13:458-60.

28. Du R, Lu C, Jiang Z, Li S, Ma R, An H, et al. Efficient typing ofcopy number variations in a segmental duplication-mediatedrearrangement hotspot using multiplex competitive amplification. J Hum Genet 2012;57:545-51.

29. Lonzetti LS, Joyal F, Raynauld JP, Roussin A, Goulet JR, Rich E, etal. Updating the American College of Rheumatology preliminaryclassification criteria for systemic sclerosis: Addition of severenailfold capillaroscopy abnormalities markedly increases the sensitivity for limited scleroderma. Arthritis Rheum 2001;44:735-6.

30. LeRoy EC, Black C, Fleischmajer R, Jablonska S, Krieg T, MedsgerTA Jr, et al. Scleroderma (systemic sclerosis): classification, subsetsand pathogenesis. J Rheumatol 1988;15:202-5.

31. Steen VD, Medsger TA Jr. Severe organ involvement in systemicsclerosis with diffuse scleroderma. Arthritis Rheum 2000;43:2437-44.

32. Dessau RB, Pipper CB. [“R”—project for statistical computing].[Article in Danish] Ugeskr Laeger 2008;170:328-30.

33. Kundu S, Aulchenko YS, van Duijn CM, Janssens AC.PredictABEL: an R package for the assessment of risk predictionmodels. Eur J Epidemiol 2011;26:261-4.

34. Robin X, Turck N, Hainard A, Tiberti N, Lisacek F, Sanchez JC, etal. pROC: an open-source package for R and S+ to analyze andcompare ROC curves. BMC Bioinformatics 2011;12:77.

35. Lou H, Li S, Yang Y, Kang L, Zhang X, Jin W, et al. A map of copynumber variations in Chinese populations. PLoS One2011;6:e27341.

36. McLean-Tooke A, Spickett GP, Gennery AR. Immunodeficiencyand autoimmunity in 22q11.2 deletion syndrome. Scand J Immunol2007;66:1-7.

37. Lambert NC, Distler O, Müller-Ladner U, Tylee TS, Furst DE,Nelson JL. HLA-DQA1*0501 is associated with diffuse systemicsclerosis in Caucasian men. Arthritis Rheum 2000;43:2005-10.

38. Berger G, Durand S, Fargier G, Nguyen XN, Cordeil S, Bouaziz S,et al. APOBEC3A is a specific inhibitor of the early phases of HIV-1 infection in myeloid cells. PLoS Pathog 2011;7:e1002221.

39. Stenglein MD, Burns MB, Li M, Lengyel J, Harris RS. APOBEC3proteins mediate the clearance of foreign DNA from human cells.Nat Struct Mol Biol 2010;17:222-9.

40. Landry S, Narvaiza I, Linfesty DC, Weitzman MD. APOBEC3A canactivate the DNA damage response and cause cell-cycle arrest.EMBO Rep 2011;12:444-50.

41. Bogerd HP, Wiegand HL, Doehle BP, Lueders KK, Cullen BR.APOBEC3A and APOBEC3B are potent inhibitors of LTR-retrotransposon function in human cells. Nucleic Acids Res2006;34:89-95.

www.jrheum.orgDownloaded on May 20, 2021 from