University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a...

32
Expanded genetic landscape of chronic obstructive pulmonary disease reveals heterogeneous cell type and phenotype associations Summary BACKGROUND: Chronic obstructive pulmonary disease (COPD) risk is influenced by genetics, but the majority of genetic factors are still unknown. The increasing availability of large-scale datasets for discovery and interpretation of genetic loci can provide novel insights into disease pathogenesis, including cell type and phenotype associations. METHODS: We performed a genome-wide association study of spirometrically-defined COPD in 24 studies, including the UK Biobank. We sought to determine, through bioinformatic and computational analysis, the likely set of variants, genes, cell types, and biologic pathways implicated by these associations. Finally, we assessed our genetic findings for relevance to COPD-related phenotypes, other respiratory diseases, and comorbidities. FINDINGS: In a total of 35,164 cases (21,081 from the Biobank) and 215,903 controls, we identified 82 loci with P value < 5 x 10 -8 . Of 35 signals not previously described in association with COPD or lung function, 13 replicated through association with lung function measures in the SpiroMeta consortium, most of the remaining signals had suggestive levels of association. Using gene expression and regulation data, we identified enrichment of signal in fetal lung, smooth muscle and alveolar type II cells. Candidate effector genes and pathways implicated by bioinformatic analysis included CHIA, IL17RD, OBFC1, and ITGB8. We found 15 shared genomic regions between COPD and asthma and 4 regions overlapping between COPD and pulmonary fibrosis. COPD genetic risk loci clustered into subsets with heterogeneous quantitative imaging features and comorbidity associations. INTERPRETATION: In a large-scale genetic associations study of COPD, genetic loci implicate a veried set of cell types, disease pathways, and phenotypes. Our study substantially expands the number of COPD-

Transcript of University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a...

Page 1: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

Expanded genetic landscape of chronic obstructive pulmonary disease reveals heterogeneous cell type and phenotype associationsSummaryBACKGROUND: Chronic obstructive pulmonary disease (COPD) risk is influenced by genetics, but the majority of genetic factors are still unknown. The increasing availability of large-scale datasets for discovery and interpretation of genetic loci can provide novel insights into disease pathogenesis, including cell type and phenotype associations.

METHODS: We performed a genome-wide association study of spirometrically-defined COPD in 24 studies, including the UK Biobank. We sought to determine, through bioinformatic and computational analysis, the likely set of variants, genes, cell types, and biologic pathways implicated by these associations. Finally, we assessed our genetic findings for relevance to COPD-related phenotypes, other respiratory diseases, and comorbidities.

FINDINGS: In a total of 35,164 cases (21,081 from the Biobank) and 215,903 controls, we identified 82 loci with P value < 5 x 10-8. Of 35 signals not previously described in association with COPD or lung function, 13 replicated through association with lung function measures in the SpiroMeta consortium, most of the remaining signals had suggestive levels of association. Using gene expression and regulation data, we identified enrichment of signal in fetal lung, smooth muscle and alveolar type II cells. Candidate effector genes and pathways implicated by bioinformatic analysis included CHIA, IL17RD, OBFC1, and ITGB8. We found 15 shared genomic regions between COPD and asthma and 4 regions overlapping between COPD and pulmonary fibrosis. COPD genetic risk loci clustered into subsets with heterogeneous quantitative imaging features and comorbidity associations.

INTERPRETATION: In a large-scale genetic associations study of COPD, genetic loci implicate a veried set of cell types, disease pathways, and phenotypes. Our study substantially expands the number of COPD-associated loci and provides multiple avenues for investigation of COPD pathogenesis and disease heterogeneity.

FUNDING: US National Heart, Lung, and Blood Institute; the Alpha-1 Foundation; the COPD Foundation through contributions from AstraZeneca, Boehringer Ingelheim, Novartis, and Sepracor; GlaxoSmithKline; Centers for Medicare and Medicaid Services; Agency for Healthcare Research and Quality; and US Department of Veterans Affairs.

Background

Chronic obstructive pulmonary disease (COPD) is a disease of enormous and growing global burden1. Environmental risk factors, predominately cigarette smoking, account for a large fraction of disease risk, but there is considerable variability in COPD susceptibility among individuals with similar smoking exposure. Studies in families and in populations demonstrate that genetic factors account for a substantial fraction of disease susceptibility.

Page 2: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

The most well-known genetic risk factor for COPD, alpha-1 antitrypsin deficiency, is due to rare variants in the SERPINA1 gene, and has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD, it is likely that, similar to other adult onset complex diseases, the majority of population risk lies in common variants2. These variants, while individually of weak effect, can lead to novel insights into disease pathogenesis3,4. The discovery of each additional locus is not only in itself an opportunity to discovery potentially new biology, but can lead to more global insights, such as functional or disease links between loci, or identification of which cell types may be more likely to drive the genetics of COPD risk.

Recently, we performed a large genome-wide association analysis of COPD as part of the International COPD Genetics Consortium, identifying 22 genome-wide significant loci5. We sought to combine this data with the UK Biobank6, a population-based study of several hundred thousand subjects with lung function and cigarette smoking assessment. We hypothesized that this larger genome-wide association study would confirm prior associations with COPD, show additional overlap with known associations with population-based lung function, and also identify new COPD susceptibility loci. We sought to determine, through bioinformatic and computational analysis, the likely set of variants, genes, and cell types, and biologic pathways, implicated by these associations. Finally, we assessed our genetic findings for relevance to COPD-specific, respiratory, and other phenotypes.

MethodsStudy populationsThe UK Biobank is a population-based cohort consisting of 502,682 individuals6. To determine lung function, we used measures of forced expiratory volume in 1 second (FEV1) and forced vital capacity (FVC) derived from the blow curve time series and subject to additional quality control based on ATS/ERS criteria7 (Supplementary Text). We defined COPD in white subjects using two pre-bronchodilator measurements of lung function according to modified GOLD criteria for moderate to very severe airflow limitation8: FEV1 less than 80% of predicted value (using reference equations from Hankinson et al.9), and the ratio between FEV1 and FVC less than 0.7. Genotyping was performed using Axiom UK BiLEVE array and Axiom Biobank array (Affymetrix, Santa Clara, California, USA) and imputed to the Haplotype Reference Consortium (HRC) version 1.1 panel10.

We invited prior participants in the International COPD Genetics Consortium (ICGC) to provide case-control association results, with the exception of the 1958 British Birth Cohort, to avoid overlapping samples with the SpiroMeta replication cohort. Detailed cohort descriptions and cohort-specific methods have been previously published5 (Supplementary Text). Briefly, ICGC cohorts performed case-control association analysis based on pre-bronchodilator measurements of FEV1 and FEV1/FVC, and cases were identified using modified GOLD criteria, as above.

We performed lookups of select significant variants for FEV1 and FEV1/FVC in the SpiroMeta consortium meta-analysis. Briefly, SpiroMeta comprised of a total of 83,118 individuals from 21 studies imputed to either the 1000 Genomes Project Phase 1 reference panel (13 studies) or the HRC (9 studies). Each study performed linear regression adjusting for age, age2, sex, and height, using rank-based inverse normal transforms, adjusting for population substructure using principal components or linear mixed models, and performing separate analyses for ever- and never- smokers or using a covariate (for studies

Page 3: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

of related subjects). Genomic control was applied to individual studies, and results were combined using fixed-effects meta-analysis.

Genome-wide association analysisIn UK Biobank, we performed logistic regression of COPD, adjusting for age, sex, genotyping array, smoking pack-years, ever smoking status, and principal components of genetic ancestry. Association analysis was done using PLINK 2.0 alpha11 (downloaded on December 11, 2017) with Firth-fallback settings, using Firth regression when quasi-complete separation or regular-logistic-regression convergence failure occurred. We performed a fixed-effects meta-analysis of all ICGC cohorts and UK Biobank using METAL (version 2010-08-01)12. We assessed population substructure and cryptic relatedness by linkage disequilibrium (LD) score regression intercept13. We defined a genetic locus using a 2-Mb window (1 Mb at either end) around a lead variant, with conditional analyses as described below.

To maximize our power to identify existing and discover new loci, we examined all loci at the traditional genome-wide significance value of 5 x 10-8. Given the strong genetic correlation between population-based lung function and COPD, we used FEV1 and FEV1/FVC in the SpiroMeta study to provide evidence of replication (Figure 1). We first characterized loci as being previously described (evidence of prior association with lung function or COPD) or novel. For novel loci, we attempted to provide evidence of replication using SpiroMeta, with Bonferroni correction for number of tests. We also examined loci meeting a nominal level of significance, or having consistent direction of effect in SpiroMeta.

Cigarette smoking is the major environmental risk factor for COPD. While we adjusted for cigarette smoking in our analysis, we and others have previously identified associations with cigarette smoking both overall and at individual loci5,14. To further examine these effects, we additionally tested for association of each locus with cigarette smoking and in two separate analyses of ever- and never- smokers in UK Biobank.

Identification of independent associations at genome-wide significant lociWe identified specific independent associations at genome-wide significant loci using GCTA-COJO15. This method utilizes approximate conditional and joint analysis approach which requires summary statistics and representative LD information. We used 10,000 randomly drawn individuals from the UK Biobank discovery dataset as a LD reference sample. We scaled genome-wide significance to a 2-Mb region, resulting in a locus-wide significant threshold of 8 x 10-5, or 2 x 10-6 for variants in MHC locus (chr6:28477797-33448354 in hg19). We created regional association plots via LocusZoom using 1000 Genomes EUR reference data (Nov2014 release)16.

Identification and prioritization of tissues and cell types, candidate variants, genes, and pathwaysIdentification of enriched tissues and specific cell typesWe used LD Score Regression (LDSC)17 to estimate the enrichment of functional annotation in disease heritability. We utilized LDSC baseline models (e.g., conserved region, promoter flanking region), tissue-specific annotations from the Roadmap Epigenomics Program18 and GenoSkyline19. We also used SNPsea20 to estimate the enrichment of specific cell types from genome-wide significant associations using gene expression data. For SNPsea, we performed the standard analyses, and added lung single-cell

Page 4: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

RNA-seq data from 540 cells extracted from individuals with and without idiopathic pulmonary fibrosis21 using the processed and normalized dataset available through Gene Expression Omnibus (GSE86618). We tested for enrichment of genes specifically expressed in the four cell types individually among genome-wide significant loci. We also appled three additional single-cell RNA-seq datasets to the SNPsea results, including inflammatory cell types from individuals with lung cancer and normal controls (both in blood and lung)22, and murine lungs at E16.5 from LungMAP23, using the org.Mm.eg.db package24 to map locations of orthologous genes.

Fine-mapping of independent association signals at genome-wide significant lociWe used Bayesian fine-mapping at each locus to identify the credible set, the set of variants with a 99% probability of containing a causal variant. Briefly, for each genome-wide significant loci we calculated approximate Bayes factors25 of association. We then selected variants in each locus, so that their cumulative posterior probability is equal or greater than 0.99. At loci with multiple independent associations, we used statistics from approximate conditional analysis with GCTA software on each index variant adjusting for other independent variants in the loci. Otherwise, we used unconditioned statistics from our meta-analysis.

To characterize variant effects in credible sets, we annotated variants in each credible set using Ensembl Variant Effect Predictor through biomart26. We defined deleterious variants as those which resulted in nonsynonymous, stop, or splice variants (terms: transcript_ablation, splice_acceptor_variant, splice_donor_variant, stop_gained, frameshift_variant, stop_lost, start_lost, transcript_amplification, inframe_insertion, inframe_deletion, missense_variant, and protein_altering_variant).

Identification of target genesWe used several computational approaches with corresponding available datasets to identify target genes in genome-wide significant loci. We used two methods that utilized gene expression data: 1) S-PrediXcan and 2) DEPICT. We used S-PrediXcan27 to identify genes which for which genetically regulated expression associated with COPD. We used, as an eQTL and gene expression reference database, data from Lung-eQTL consortium28,29 (1,111 lung tissue samples). DEPICT (Data-driven Expression Prioritized Integration for Complex Traits) uses co-regulation of gene expression and gene-sets to prioritize target genes at associated loci30.

We also used additional information on gene regulation, including epigenetic data: 4) regulatory fine mapping, 5) mQTL, and 6) chromosome conformation capture. We used regulatory fine mapping (regfm31) to map open chromatin regions based on DNAse with gene expression of different cell types in the Roadmap Epigenomic Project, and implicate potentially target genes of disease-associated variants. We also searched for overlapping methylation quantitative trait loci (mQTL) data from lung tissue32. To determine whether these signals co-localized (rather than being related due to linkage disequilibrium), we performed colocalization analysis between our GWAS and mQTL in genome-wide significant loci using eCAVIAR33 (eQTL and GWAS CAusal Variants Identification in Associated Regions). We also sought information from publicly available chromosome conformation capture data34. We obtained statistics of high-confidence contacts in fetal lung fibroblast cell line (IMR90) and human lung tissue34 through HUGIn35 (Hi-C Unifying Genomic Interrogator). We anchored long range chromatin interactions on top associated variants, and computed statistical significance of contact at each locus. We retained only strongest associations (i.e., smallest P value) for each cell line/primary cell in the analysis.

Page 5: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

Finally, we searched for signals from non-synonymous variants. We identified coding variants present in the credible set in the GWAS with a high posterior probability. We also searched for rare coding variants, based on exome sequencing results in the COPDGene, Boston Early-Onset COPD, and International COPD Genetics Network studies. Details of the exome sequencing results can be found in [Qiao et al, unpublished manuscript]. In brief, 485 severe COPD cases and 504 smoking resistant controls from the COPDGene study were exome-sequenced and analyzed as case-control data. 1554 subjects ascertained through probands with severe COPD from the Boston Early-Onset COPD study (BEOCOPD) and the ICGN study were exomed and analyzed using the lung function (FEV1) as the outcome. Single-variant analyses were done using Firth and efficient resampling methods (SKAT R package36) for the COPDGene data and generalized linear mixed models (GMMAT) for the BEOCOPD-ICGN data. Gene-based analyses were conducted using Burden, SKAT, and SKAT-O tests with asymptotic and efficient resampling methods (SKAT package) combined with Fisher’s method for the COPDGene data, and using SKAT-O tests (MONSTER) for the BEOCOPD-ICGN data. Two variant-filtering criteria were considered: deleterious variants (predicted by FATHMM) with MAF < 0.01, and functional variants (moderate effect predicted by SNPEff) with MAF < 0.05. Gene-based segregation test (GESE) was also applied to the ultra-rare (MAF < 0.1%) and loss-of-function variants in the BEOCOPD-ICGN data on the severe COPD affection status.

For each dataset described above, we used Bonferroni-corrected P values, or a fixed posterior probability threshold to determine target genes at each locus. To correct for possible number of genes in each locus, we obtained a list of protein-coding genes within 1 Mb from a top associated variant from biomart26. For each locus, we used a Bonferroni-corrected P value less than 0.05 (i.e., 0.05 divided by number of genes at that locus) to determine significance for 4data types: gene expression data, chromatin conformation capture data, co-regulation of gene expression, and exome sequencing results. For two remaining datasets, we used a fixed posterior probability threshold of 0.1 for regfm and eCAVIAR. We considered genes passing either of these criteria as target genes.

Identification of pathwaysTo identify enriched pathways in COPD-associated loci, we performed gene-set enrichment analysis using DEPICT30. We determined significant gene sets using P value at false discovery rate (FDR) less than 0.05.

Effects on COPD-related and other phenotypesCOPD is a complex and heterogeneous disorder, comprised of different biologic processes and specific phenotypic effects. In addition, many loci discovered by GWAS have pleiotropic effects. To identify these effects, we performed analyses of a) overall genetic correlation with other phenotypes, based on GWAS summary statistics; b) genetic association studies of our genome-wide significant findings using COPD-related phenotypes, including a cluster analysis to identify groups of variants that may be acting via similar mechanisms, c) identification of overlapping genetic loci between related disorders (e.g. asthma and pulmonary fibrosis). We estimated genetic correlation between COPD and 832 diseases/traits in LD Hub37. For our additional genetic association studies, we tested for associations between our genome-wide significant loci with 121 features (e.g., lung function, computed tomography-derived metrics, biomarkers, and comorbidities). To characterize effects of genome-wide significant loci to these features, we performed hierarchical clustering based on features with at least one nominal significant association. We identify optimal number of clusters using the Calinski index38,39. To identify features that independently predict cluster membership, we fitted a logistic regression model via

Page 6: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

penalized maximum likelihood using the glmnet package40. We determined optimal regularization parameters using 10-fold cross validation. To identify overlapping loci between COPD and related disorders, we used gwas-pw41 to perform pairwise analysis of GWAS. This method searches for shared genomic segments42 using adaptive significance threshold, allowing detection of sub genome-wide significant loci. We identified shared segments or variants using posterior probability of colocalizing greater than 0.941. We obtained GWAS summary statistics from two previous studies of pulmonary fibrosis43 and multi-ethnic GWAS of asthma44.

Drug repositioning using drug-gene expression signaturesWe used a gene-based association method that utilizes GWAS summary statistics and gene expression reference databases to produce a gene list for drug-gene expression similarity analysis29,45,46. In brief, we tested for associations of the genetic component of gene expression and COPD. This method gave us a rank gene list of upregulated and down-regulated genes. We used the Query, an application in clue.io47, to calculate connectivity score of our gene set and drug-gene expression signatures. We included drug-gene expression signatures from 2,837 compounds in nine cell lines in the analysis47. To find candidate drugs for repositioning, we used negative connectivity score less than -90% as the threshold47, hypothesizing that a drug candidate need to produce an opposing or reverse expression signature induced by the disease.

Role of the funding sourceNone of the study sponsors had no role in study design, data collection, data analysis, data interpretation, or writing of the report. The corresponding author had full access to all the data in the study and had final responsibility for the decision to submit for publication.

ResultsGenome-wide association study of COPDAmong 502,682 participants in UK Biobank, we included 200,794 unrelated individuals of European ancestry passing genetic and phenotypic quality control, with non-missing covariates including cigarette smoke exposure, and identifiable as cases or controls based on spirometry (Supplementary Figure 1 and Supplementary Text). This corresponds to 21,081 cases and 179,713 controls in UK Biobank. In ICGC cohorts, we included 24 studies (14,083 cases and 42,190 controls). Characteristics of these individuals are shown in Supplementary Table 1. We tested associations of moderate to severe COPD and 7,810,595 variants which present in both UK Biobank and ICGC cohorts. We found no evidence of population substructure and cryptic relatedness in either cohorts (linkage disequilibrium score regression (LDSC) intercept 1·027 [s.e. 0·007] and 0·999 [s.e. 0·007] in UK Biobank and ICGC, respectively).

In our meta-analysis, we identified 82 loci (defined using 2-Mb windows) at the traditional genome-wide significance value of 5 x 10-8 (Figure 2). Sixteen of 82 loci were previously described as genome-wide significant in COPD, and an additional 31 loci were previously described in association with population-based lung function48 (Supplementary Table 2). This left 35 loci that were novel for COPD and lung function (Table 1). We then sought to replicate these novel loci in independent cohorts. Given the strong genetic correlation between population-based lung function and COPD, we performed a lookup

Page 7: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

of these loci for FEV1 or FEV1/FVC in SpiroMeta. We identified 13 loci - C1orf87, DENND2D, DDX1, SLMAP, BTC, FGF18, CITED2, ITGB8, OBFC1, ARNTL, SERP2, DTWD1, and ADAMTSL3 (one-sided P < 0.05/37; Table 1). Although not meeting the strict Bonferroni threshold, additional 14 novel loci were nominally significant in SpiroMeta (consistent direction of effect and one-sided P < 0·05) including ASAP2, EML4, VGLL4, ADCY5, HSPA4, CCDC69, RREB1, ID4, IER3, RFX6, MFHAS1, COL15A1, TEPP, and THRA (Table 1).

In our overall meta-analysis, all 82 of the genome-wide significant loci showed consistent direction of effect with either FEV1 or FEV1/FVC ratio in SpiroMeta (Supplementary Text). Given the evidence of enrichment for prior signals and overall genome-wide significant results, we performed all subsequent analysis using the overall GWAS, including all 82 loci genome-wide significant in discovery, recognizing that some of our signals could be false positives.

Our 82 genome-wide significant variants explain up to 7·0 % of the phenotypic variance in liability scale, using a 10% prevalence of COPD, and acknowledging that these effects may be overestimated. This is a 48% increase in COPD phenotypic variance explained by genetic loci compared to the 4·7% explained by 22 loci reported in a recent GWAS of COPD5.

Relationship of genome-wide significant variants to smokingOf the 82 identified loci, three were associated with cigarette smoking in the UK Biobank after Bonferroni correction; the strongest association was for the known 15q25 locus. The next most highly associated locus was IER3 (P = 9·2 x 10-5) at 6p21; however, at this locus, the association with lung function in never-smokers in the UK Biobank was highly significant (P = 6·7 x 10-13). In addition, the SPPL2C (17q21) locus at 17q21 was associated with smoking (P=1·4 x 10-4) and was also significant in the analysis of never-smokers (2·8 x 10-5).

Identification of secondary association signalsWe then used approximate conditional and joint analysis to find secondary signals at each of the 82 genome-wide significant loci. We found secondary signals at 50 loci, with a maximum of five secondary signals identified at the RREB1 locus (Supplementary Table 3). Twenty of these signals were genome-wide significant, resulting in a total of 102 independent variants reaching traditional genome-wide significance (< 5 x 10-8) in our study.

Tissue and specific cell typesIn determining the tissue in which COPD genetic variants function to increase risk to COPD, lung is the obvious tissue to consider; however, COPD is a systemic disease49,50 and the specific cell-types underlying disease pathogenesis are largely unknown. Furthermore, available databases often include specific cell types (e.g. smooth muscle) from non-lung organs (e.g. the gastrointestinal tract). To identify putative causal tissues and cell types, we assessed the enrichment of our genome-wide significant COPD loci in ensemble tissue signatures19, tissue-specific epigenomic marks18, and genome-wide gene expression patterns20. Lung tissue showed the most significant enrichment (OR 9·25, P=1·36 x 10-9), as previously described, though significant enrichment was also seen in heart (OR 6·85, P=3·83 x 10-8) and the gastrointestinal (GI) tract (OR 5·53, P=6·45 x 10-11). In an analysis of enriched epigenomic marks, the most significant enrichment was in fetal lung and GI smooth muscle DNase (P= 6·75 x 10-8) and H3K4me1 (P= 7·31 x 10-7) (Supplementary Table 4). To further identify lung-specific cell types, we used SNPSea to examine gene expression from single-cell RNA-seq. We found nominally significant enrichment of

Page 8: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

alveolar type II and basal cells (P=0·016 and P=0·042, respectively) in 82 genome-wide signficiant associations using gene expression data from respiratory cell types21 (Supplementary Figure 4). We were unable to identify a significant signal for enrichment in human22 and mouse lung23 inflammatory cells.

Fine-mapping of associated lociTo identify the most likely causal variants, we performed fine mapping using Bayesian credible sets25. Including 160 potential primary and secondary association signals, and excluding variants in the MHC region, 61 independent signals had a 99% credible set with fewer than 50 variants; 34 signals had credible sets with fewer than 20 variants. Only the association signal at NPNT (4q24) could be fine-mapped to single variant; however, in 17 other loci, a single variant had posterior probability of driving association (PPA) greater than 60% (Supplementary Table 5). Functional annotations of variants with PPA >= 60% are shown in Supplementary Table 5. Most variants overlapped genic enhancers of lung-related cell types (e.g., fetal lung fibroblasts, fetal lung, and adult lung fibroblasts) and were predicted to alter transcription binding motifs. Of 57 credible sets with fewer than 50 variants, nine sets contained at least one deleterious variant. These deleterious variants included 1) missense variants affecting TNS1, RIN3, GPR126, ADAM19, ATP13A2, BTC, and CRLF3; 2) a variant resulting in a premature stop codon affecting ITGB8; and 3) a splice donor variant affecting a lincRNA - AP003059.2.

Candidate target genesIn most cases, the closest gene to a lead SNP will not be the gene most likely to be the causal or effector gene of disease-associated variants51–53. Thus, to identify the potential effector genes underlying these genetic associations, we integrated additional molecular information including gene expression; gene regulation (open chromatin and methylation data), chromatin interaction, co-regulation of gene expression with gene sets and coding variant data.

At 82 loci, 481 genes were implicated by either gene expression (Bonferroni corrected at locus level), methylation (CLPP > 0.1), significant contact (most significant gene per cell types with P < 0.05), open chromatin regions (PP > 0.1), co-regulation of gene expression (FDR < 0.05), or deleterious variants detected by exome analysis. Excluding loci in the extended MHC region, the median numbers of potentially implicated genes per locus was four with a maximum of 18 genes (7q22.1). The median distance of implicated genes to top associated variants was 336 Kb, restricting to genes within 1 Mb of top associated variants. Among 82 loci, 59 loci (72%) included the nearest gene. Of 481 genes, 106 were implicated by at least two different sources. We identified 19 genes which were region-wise Bonferroni significant in previously published exome sequencing data. We summarize all genes implicated using these approaches in Supplementary Table 6.

Associated pathwaysTo gain further functional insight of associated genetic loci, we performed gene-set enrichment analysis using DEPICT. Among 165 enriched gene sets at FDR < 0.05, 44% of them were related to the developmental process term. Many gene sets were related to lung development including lung alveolus development (nominal P= 0.0003) and lung morphogenesis (nominal P= 0.0005). Interestingly, we also found enrichment of extracellular matrix -related pathways including laminin binding, integrin binding, mesenchyme development, cell-matrix adhesion, and actin filament bundles. Furthermore, there was enrichment of signaling pathways including histone deacetylase binding, Wnt receptor signaling pathway, SMAD binding, the MAPK cascade, and transmembrane receptor protein serine/threonine

Page 9: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

kinase signaling pathway. Full enrichment analysis results including top genes for each DEPICT gene set are shown in Supplementary Table 7.

Genetic correlation of COPD and related traitsTo gain further insight of genetic contribution of COPD and other traits, we estimated genetic correlation among traits using LDSC in LD hub37,54. In addition to our previously reported correlations between COPD and lung function (FEV1 and FEV1/FVC), asthma, and height are significant (Supplementary Figure 5 and Supplementary Table 8). We additionally identified significant correlation of COPD and lung cancer (rg=0.184, P=0.003). We also tested genetic correlation between COPD and other comorbidities. We were not able to demonstrate genetic correlation of COPD with bone mineral density, major depressive disorder, coronary artery disease, or type 2 diabetes mellitus (Supplementary Figure 5 and Supplementary Table 8) We also identified nominally significant genetic correlation of COPD with hand grip, angina, gastroesophageal reflux disease, multiple sclerosis, and schizophrenia (Supplementary Figure 5 and Supplementary Table 8).

Identification of loci overlapping with asthma and pulmonary fibrosisBased on our previous identification of genetic overlap of COPD with asthma, and COPD with pulmonary fibrosis, we applied a Bayesian method to identify overlap (gwas-pw) of COPD associations from our current GWAS with previous GWASs of asthma and pulmonary fibrosis43,44. We identified 19 shared genome segments (posterior probability > 90%), 15 with asthma and 4 with pulmonary fibrosis (Figure 4). For asthma, 7 of 15 overlapping segments did not reach genome-wide significance in COPD. In the 15 genome segments overlapping between COPD and asthma, in 4 of the segments (variants near LOC401242, TSLP, C11orf30, and IL33) the overlap is explained by a single variant with posterior probability > 90%. Two genome segments (near DSP and ZKSCAN1) in the COPD and pulmonary fibrosis overlap, are explain by a single variant with posterior probability > 90%. For all six overlapping associations explained by a single variant (4 for the COPD/asthma overlap and 2 for the COPD/pulmonary fibrosis overlap), the variant for COPD and either asthma or pulmonary fibrosis had an opposite direction of effect (i.e., increasing risk of COPD but protective for IPF and asthma).

Phenotypic effects of known and novel associations for COPDTo characterize the phenotypic effects of 82 genome-wide significant loci, we performed a phenome-wide association analysis within the deeply phenotyped COPDGene study55. We tested the overall structure of associated phenotypes by using hierarchical clustering across scaled Z scores of associations. We identified two clusters of associated variants (Supplementary Figure 6). These two clusters could be differentiated using 14 features including computed tomography (CT) subtypes, CT-derived quantitative emphysema and airway phenotypes, intensity and status of smoking, diffusing capacity of carbon monoxide, asthma, and inflammatory biomarkers. We further performed clustering limited to imaging features. We found two clusters of associations, differentiating by quantitative emphysema and gas trapping (Figure 3). We also examined all genome-wide significant loci in the NHGRI-EBI GWAS Catalog56 (Supplementary Figure 7). Many variants associated with anthropometric measures including height and body mass index (BMI), various measurements on blood cells, and cancers (breast and prostate). Several variants also associated with comorbidities of COPD: atherosclerotic diseases (coronary artery disease and peripheral arterial disease), diabetes mellitus, bone density.

Page 10: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

Drug repositioning using drug-gene expression signaturesGWAS is also useful for identifying drug targets either at individual gene48,57,58 or genome-wide level59,60. The recent development of richer datasets of drug-induced gene expression signatures47 and a statistical framework that utilizes genome-wide associations46 allow us to reposition approved or in-development drugs to COPD. This approach considers the similarity between disease and drug-induced gene expression signatures in an opposing pattern (i.e., use drug to reverse diseased gene expression signature). We approximated COPD gene expression patterns by calculating transcriptome-wide associations in lung tissue from genetic predictors45. We calculated standardized connectivity scores from drug-gene expression profiles across 2,837 compounds in nine cell lines (Connectivity Map database) using Query in clue.io47. We identified seven compounds with an opposing connectivity score >= 90%: leu-enkephalin (an opioid receptor agonist), huperizine-a (an acetylcholinesterase inhibitor), periplocymarin (an apoptosis stimulant), PAC-1 (a caspase activator), TER-14687 (an inhibitor of translocation of PKCq in T cells), vincristine (a tubulin inhibitor), and terreic-acid (a Bruton's tyrosine kinase (BTK) inhibitor).

DiscussionGenetic factors play an important role in COPD susceptibility; however, the majority of this genetic risk remains unexplained. Identifying new loci may help identify important disease pathways and also may help further explain the clinical heterogeneity seen in COPD. We examined genetic risk of COPD in a genome-wide association study including a total of 35,164 cases and 221,903 controls. We identified 82 genome-wide significant loci for COPD, including 16 previously identified in genome-wide association studies of COPD and additional 31 in population-based lung function, and 35 loci not previously described for COPD or lung function, of which 13 replicated in an independent study of population-based lung function.

Our study helps support the role of early life events in the risk of COPD. We identified many sources of evidence supporting a role of developmental processes in the pathogenesis of COPD. The first evidence came from gene set enrichment analysis, identifying lung morphogenesis and lung alveolar development gene sets. We identified the involvement of various pathways including the canonical Wnt receptor signaling pathway61,62, MAPK cascade, Ras protein signal transduction, and the nerve growth factor receptor signaling pathway. The second evidence emerged from the findings of importance of gene regulation at fetal stages. We identified enrichment of heritability in epigenomic marks of various fetal tissues, while fetal lung showing strongest signals. Lastly, we identified alveolar type 2 (AT2) cell, a regenerative cell in lung with surfactant production, as a putative target cell type for COPD. This finding could support the lung ageing hypothesis63 which considers the suboptimal response to injury to noxious stimuli such as cigarette smoking, and recent findings on the regenerative potential of AT2 cells64. Our findings are consistent with recent epidemiologic studies demonstrating that a substantial portion of the risk of COPD may be develop in early life: genetic findings may set initial lung function65 and patterns of growth and decline65–67.

In addition to genes directly related to lung development, we also identified many pathways related to supporting elements including mesenchyme development and extracellular matrix. For example, we identified ADAMTSL3 as a potential effector gene at a novel susceptibility locus. In addition to a role in development, this gene plays a role in cell-matrix interactions or in assembly of specific extracellular matrices68. Our study also supports additional pathways previously hypothesized to play a role in COPD,

Page 11: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

some of which have been previously identified in studies of lung function69, including cilia structure70, elastin-associated microfibrils71, and retinoic acid receptor beta72.

We identified enrichment of epigenomic marks in lung tissue and smooth muscle (also identified in studies of lung function73); this latter association is driven by non-respiratory cell types, and suggests that more gene expression and regulatory mapping of both fetal and adult lung cells including in-depth characterization of cell lineages in the lung is needed to further characterize cell types affected by disease-associated variants. We identified many candidate genes at genome-wide significant loci. Combining five molecular datasets, we found 481 genes at 82 loci that were supported by at least one dataset, and 106 genes in at least two datasets. Among our novel findings are an association with the chitinase acidic (CHIA) gene at 1p13.3, which encodes a protein that degrades chitin74. Its expression is specific to lung75,76. Genetic variants in or around this gene were associated with baseline FEV1 and rate of FEV1 decline77, asthma78–81, acid mammalian chitinase activity80,82, and IgE79 although findings have been inconsistent83. Interleukin 17 receptor D (IL17RD) at 3p14.3 encodes a membrane protein belonging to the interleukin-17 receptor (IL-17R) protein family74. The gene product affects fibroblast growth factor signaling, inhibiting or stimulating growth through MAPK/ERK signaling74. It also interacts with TNF receptor 2 (TNFR2) to activate NF-κB84. Integrin subunit beta 8 (ITGB8) at 7p21.1 is a member of the integrin beta chain family and encodes a single-pass type I membrane protein that binds to an alpha subunit to form an integrin complex74. The complexes mediate cell-cell and cell-extracellular matrix interactions and this complex plays a role in human airway epithelial proliferation74 and repair85. Expression of this protein is increased in COPD86–88.

Characterization of genome-wide significant associations revealed heterogenous effect to COPD-related phenotypes and other biological processes. Within the well-phenotyped COPDGene cohort, we identified variable effects of these variants on emphysema and airway phenotypes which recapitulated clinically and epidemiologically-observed emphysema-predominant presentation of COPD89–91. We also identified other features including computed tomography (CT) subtypes, intensity and status of smoking, diffusing capacity of carbon monoxide, asthma, and inflammatory biomarkers, which are supported by previous findings [ref]. The identification of variable COPD risk loci associations with sub-phenotypes indicates possible implications for more nuanced approaches to therapy for COPD.

Analyzing over hundreds of traits in GWAS Catalog, we identified overlapping associations with various diseases/traits in multiple organ systems. Some overlapping diseases were recognized as comorbidities including coronary artery disease, bone mineral density, and type 2 diabetes mellitus. variable associations with multiple traits among COPD-associated loci again highlight the more heterogeneous and diverse biological mechanism as seen in other diseases92,93.

This study also enhances or understanding about relationship between COPD with asthma and pulmonary fibrosis. Extending from previously reported genetic correlations5, we identified 19 genetic loci shared between these pairs of diseases. In addition to two previously reported overlapping loci for COPD and IPF (near FAM13A and DSP), we identified two loci near ZKSCAN1 (7q22.1) and CRHR1 (17q21.31). We identified multiple overlapping loci with asthma including 2p24.3 (DDX1), 6p21.33, 6p22.1, 9p21.3 (ELAVL2), 19q13.32 (RSPH6A). The locus near DDX was associated in the combined analysis of COPD and asthma94, and is involved in NFkB pathway95, leading to a possible shared inflammatory mechanism between these diseases. We also identified genetic overlap between COPD and lung cancer, confirming previous findings96. In this study, we identified three overlapping loci

Page 12: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

including 6p21, 15q25, and 19q13, which were previously reported97–100. The proportion of this shared risk due to smoking is not known; in epidemiologic studies, COPD confers greater risk to lung cancer independent of smoking101,102, with many shared plausible pathways97,100,102,10399. Further identification of genetic loci or pathways underlying this overlap is needed to understand underlying mechanism of this process104.

While our study is the largest genome-wide study of COPD, our use of population-based lung function for replication, along with pre-bronchodilator spirometry, could bias our findings against variants that are only associated with more severe forms of COPD. Based on prior analyses5, we believe that in most cases this bias is small. However, we cannot rule out a role for studying more severe disease; we note that the alpha-1 locus (SERPINA1) was identified as genome-wide significant in smaller studies of emphysema and in severe COPD in smokers. Power considerations in genome-wide association studies generally favor larger sample sizes and are relatively tolerant of phenotypic misclassification. Whether other subsets of subjects with COPD of sufficient phenotypic specificity and sample size can lead to identification of new loci remains to be determined. The largest sample size came from the UK Biobank; using UK Biobank as the test and the ICGC as the replication did not identify novel replicated genome-wide significant findings after Bonferroni correction. Our study focused on relatively common variants available through HRC imputation; additional more detailed studies of rare variants, regions such as HLA, and importantly, in other ethnicities, are warranted, but the latter, in particular are limited by currently available cohorts.

In conclusion, our work finds a substantial number of new loci for COPD, and via extensive bioinformatic analysis identifies potential genes and pathways for both existing and novel loci. Characterization of phenotypic effects of these loci suggests heterogeneity across COPD-associated loci as well as new findings of overlap with asthma and idiopathic pulmonary fibrosis. As the burden of COPD continues to increase, our study provides multiple avenues for new investigation for this deadly disease.

Page 13: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

FiguresFigure 1 Study designCOPD, chronic obstructive pulmonary disease; FEV1, force expiratory volume in one second; FVC, forced vital capacity; ICGC, International COPD Genetics Consortium.

Figure 2 Manhattan plotLoci are labeled with the closest gene to the lead variant. Variants within 500 kb of the top associated variant at each novel locus are in indicated in orange.

Figure 3 Heatmap of associations of 82 index variants and imaging phenotypes in COPDGene. White lines represent putative groups of emphysema (bottom left), gas trapping (upper right), and a mixed group.

Page 14: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

Figure 4 Shared genomic segments between COPD with asthma and idiopathic pulmonary fibrosisShared genomic segments between COPD with A) asthma and B) idiopathic pulmonary fibrosis. Genomic segments with posterior probability of association (PPA) greater than 0.6 are labeled.

Page 15: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,
Page 16: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

TablesTable 1 Meta-analysis results showing 35 loci novel for COPD and lung function(see the Excel file)

References1 GBD 2015 Chronic Respiratory Disease Collaborators. Global, regional, and national deaths,

prevalence, disability-adjusted life years, and years lived with disability for chronic obstructive pulmonary disease and asthma, 1990-2015: a systematic analysis for the Global Burden of Disease Study 2015. Lancet Respir Med 2017; 5: 691–706.

2 Fuchsberger C, Flannick J, Teslovich TM, et al. The genetic architecture of type 2 diabetes. Nature 2016; 536: 41–7.

3 Jiang Z, Lao T, Qiu W, et al. A Chronic Obstructive Pulmonary Disease Susceptibility Gene, FAM13A, Regulates Protein Stability of beta-Catenin. Am J Respir Crit Care Med 2016; 194: 185–97.

4 Lao T, Jiang Z, Yun J, et al. Hhip haploinsufficiency sensitizes mice to age-related emphysema. Proc Natl Acad Sci U S A 2016; 113: E4681-7.

5 Hobbs BD, de Jong K, Lamontagne M, et al. Genetic loci associated with chronic obstructive pulmonary disease overlap with loci for lung function and pulmonary fibrosis. Nat Genet 2017; 49: 426–32.

6 Sudlow C, Gallacher J, Allen N, et al. UK biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLoS Med 2015; 12: e1001779.

7 Miller MR, Hankinson J, Brusasco V, et al. Standardisation of spirometry. Eur Respir J 2005; 26: 319–38.

8 Vogelmeier CF, Criner GJ, Martinez FJ, et al. Global Strategy for the Diagnosis, Management, and Prevention of Chronic Obstructive Lung Disease 2017 Report. GOLD Executive Summary. Am J Respir Crit Care Med 2017; 195: 557–82.

9 Hankinson JL, Odencrantz JR, Fedan KB. Spirometric reference values from a sample of the general U.S. population. Am J Respir Crit Care Med 1999; 159: 179–87.

10 McCarthy S, Das S, Kretzschmar W, et al. A reference panel of 64,976 haplotypes for genotype imputation. Nat Genet 2016; 48: 1279–83.

11 Chang CC, Chow CC, Tellier LC, Vattikuti S, Purcell SM, Lee JJ. Second-generation PLINK: rising to the challenge of larger and richer datasets. Gigascience 2015; 4: 7.

12 Willer CJ, Li Y, Abecasis GR. METAL: fast and efficient meta-analysis of genomewide association scans. Bioinformatics 2010; 26: 2190–1.

Page 17: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

13 Bulik-Sullivan BK, Loh P-R, Finucane HK, et al. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat Genet 2015; 47: 291–5.

14 Tobacco and Genetics Consortium. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat Genet 2010; 42: 441–7.

15 Yang J, Ferreira T, Morris AP, et al. Conditional and joint multiple-SNP analysis of GWAS summary statistics identifies additional variants influencing complex traits. Nat Genet 2012; 44: 369–75, S1-3.

16 Pruim RJ, Welch RP, Sanna S, et al. LocusZoom: regional visualization of genome-wide association scan results. Bioinformatics 2010; 26: 2336–7.

17 Finucane HK, Bulik-Sullivan B, Gusev A, et al. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat Genet 2015; 47: 1228–35.

18 Finucane HK, Reshef YA, Anttila V, et al. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat Genet 2018; 50: 621–9.

19 Lu Q, Powles RL, Abdallah S, et al. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer’s disease. PLoS Genet 2017; 13: e1006933.

20 Slowikowski K, Hu X, Raychaudhuri S. SNPsea: an algorithm to identify cell types, tissues and pathways affected by risk loci. Bioinformatics 2014; 30: 2496–7.

21 Xu Y, Mizuno T, Sridharan A, et al. Single-cell RNA sequencing identifies diverse roles of epithelial cells in idiopathic pulmonary fibrosis. JCI insight 2016; 1: e90558.

22 Lavin Y, Kobayashi S, Leader A, et al. Innate Immune Landscape in Early Lung Adenocarcinoma by Paired Single-Cell Analyses. Cell 2017; 169: 750–765.e17.

23 Guo M, Wang H, Potter SS, Whitsett JA, Xu Y. SINCERA: A Pipeline for Single-Cell RNA-Seq Profiling Analysis. PLoS Comput Biol 2015; 11: e1004575.

24 Carlson M. org.Mm.eg.db: Genome wide annotation for Mouse. 2018.

25 Wakefield J. A Bayesian measure of the probability of false discovery in genetic epidemiology studies. Am J Hum Genet 2007; 81: 208–27.

26 Durinck S, Spellman PT, Birney E, Huber W. Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt. Nat Protoc 2009; 4: 1184–91.

27 Barbeira AN, Pividori MD, Zheng J, Wheeler HE, Nicolae DL, Im HK. Integrating Predicted Transcriptome From Multiple Tissues Improves Association Detection. bioRxiv 2018; published online Jan 1. http://biorxiv.org/content/early/2018/04/06/292649.abstract.

28 Hao K, Bossé Y, Nickle DC, et al. Lung eQTLs to help reveal the molecular underpinnings of asthma. PLoS Genet 2012; 8: e1003029.

29 Lamontagne M, Bérubé J-C, Obeidat M, et al. Leveraging lung tissue transcriptome to uncover candidate causal genes in COPD genetic associations. Hum Mol Genet 2018; 27: 1819–29.

30 Pers TH, Karjalainen JM, Chan Y, et al. Biological interpretation of genome-wide association studies using predicted gene functions. Nat Commun 2015; 6: 5890.

Page 18: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

31 Shooshtari P, Huang H, Cotsapas C. Integrative Genetic and Epigenetic Analysis Uncovers Regulatory Mechanisms of Autoimmune Disease. Am J Hum Genet 2017; 101: 75–86.

32 Morrow JD, Glass K, Cho MH, et al. Human Lung DNA Methylation Quantitative Trait Loci Colocalize with COPD Genome-wide Association Loci. Am J Respir Crit Care Med 2018; published online Jan 9. DOI:10.1164/rccm.201707-1434OC.

33 Hormozdiari F, van de Bunt M, Segrè A V, et al. Colocalization of GWAS and eQTL Signals Detects Target Genes. Am J Hum Genet 2016; 99: 1245–60.

34 Schmitt AD, Hu M, Jung I, et al. A Compendium of Chromatin Contact Maps Reveals Spatially Active Regions in the Human Genome. Cell Rep 2016; 17: 2042–59.

35 Martin JS, Xu Z, Reiner AP, et al. HUGIn: Hi-C Unifying Genomic Interrogator. Bioinformatics 2017; 33: 3793–5.

36 Lee S, Fuchsberger C, Kim S, Scott L. An efficient resampling method for calibrating single and gene-based rare variant association analysis in case-control studies. Biostatistics 2016; 17: 1–15.

37 Zheng J, Erzurumluoglu AM, Elsworth BL, et al. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 2017; 33: 272–9.

38 Caliński T, Harabasz J. A dendrite method for cluster analysis. Commun Stat 1974; 3: 1–27.

39 Dimas AS, Lagou V, Barker A, et al. Impact of type 2 diabetes susceptibility variants on quantitative glycemic traits reveals mechanistic heterogeneity. Diabetes 2014; 63: 2158–71.

40 Friedman J, Hastie T, Tibshirani R. Regularization Paths for Generalized Linear Models via Coordinate Descent. J Stat Softw 2010; 33: 1–22.

41 Pickrell JK, Berisa T, Liu JZ, Ségurel L, Tung JY, Hinds DA. Detection and interpretation of shared genetic influences on 42 human traits. Nat Genet 2016; 48: 709–17.

42 Berisa T, Pickrell JK. Approximately independent linkage disequilibrium blocks in human populations. Bioinformatics 2016; 32: 283–5.

43 Fingerlin TE, Zhang W, Yang I V, et al. Genome-wide imputation study identifies novel HLA locus for pulmonary fibrosis and potential role for auto-immunity in fibrotic idiopathic interstitial pneumonia. BMC Genet 2016; 17: 74.

44 Demenais F, Margaritte-Jeannin P, Barnes KC, et al. Multiancestry association study identifies new asthma risk loci that colocalize with immune-cell enhancer marks. Nat Genet 2018; 50: 42–53.

45 Barbeira A, Dickinson SP, Torres JM, et al. Integrating tissue specific mechanisms into GWAS summary results. bioRxiv 2017.

46 So H-C, Chau CK-L, Chiu W-T, et al. Analysis of genome-wide association data highlights candidates for drug repositioning in psychiatry. Nat Neurosci 2017; 20: 1342–9.

47 Subramanian A, Narayan R, Corsello SM, et al. A Next Generation Connectivity Map: L1000 Platform and the First 1,000,000 Profiles. Cell 2017; 171: 1437–1452.e17.

48 Wain L V, Shrine N, Artigas MS, et al. Genome-wide association analyses for lung function and

Page 19: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

chronic obstructive pulmonary disease identify new loci and potential druggable targets. Nat Genet 2017; 49: 416–25.

49 Agusti A, Soriano JB. COPD as a systemic disease. COPD 2008; 5: 133–8.

50 Barnes PJ, Celli BR. Systemic manifestations and comorbidities of COPD. Eur Respir J 2009; 33: 1165–85.

51 Visscher PM, Wray NR, Zhang Q, et al. 10 Years of GWAS Discovery: Biology, Function, and Translation. Am J Hum Genet 2017; 101: 5–22.

52 Zhou X, Baron RM, Hardin M, et al. Identification of a chronic obstructive pulmonary disease genetic determinant that regulates HHIP. Hum Mol Genet 2012; 21: 1325–35.

53 Claussnitzer M, Hui C-C, Kellis M. FTO Obesity Variant and Adipocyte Browning in Humans. N Engl J Med 2016; 374: 192–3.

54 Bulik-Sullivan B, Finucane HK, Anttila V, et al. An atlas of genetic correlations across human diseases and traits. Nat Genet 2015; 47: 1236–41.

55 Regan EA, Hokanson JE, Murphy JR, et al. Genetic epidemiology of COPD (COPDGene) study design. COPD 2010; 7: 32–43.

56 MacArthur J, Bowler E, Cerezo M, et al. The new NHGRI-EBI Catalog of published genome-wide association studies (GWAS Catalog). Nucleic Acids Res 2017; 45: D896–901.

57 Sanseau P, Agarwal P, Barnes MR, et al. Use of genome-wide association studies for drug repositioning. Nat Biotechnol 2012; 30: 317–20.

58 Lencz T, Malhotra AK. Targeting the schizophrenia genome: a fast track strategy from GWAS to clinic. Mol Psychiatry 2015; 20: 820–6.

59 Lamb J, Crawford ED, Peck D, et al. The Connectivity Map: using gene-expression signatures to connect small molecules, genes, and disease. Science 2006; 313: 1929–35.

60 Sirota M, Dudley JT, Kim J, et al. Discovery and preclinical validation of drug indications using compendia of public gene expression data. Sci Transl Med 2011; 3: 96ra77.

61 Skronska-Wasek W, Mutze K, Baarsma HA, et al. Reduced Frizzled Receptor 4 Expression Prevents WNT/β-Catenin-driven Alveolar Lung Repair in Chronic Obstructive Pulmonary Disease. Am J Respir Crit Care Med 2017; 196: 172–85.

62 Sakornsakolpat P, Morrow JD, Castaldi PJ, et al. Integrative genomics identifies new genes associated with severe COPD and emphysema. Respir Res 2018; 19: 46.

63 Agustí A, Faner R. COPD beyond smoking: new paradigm, novel opportunities. Lancet Respir Med 2018; 6: 324–6.

64 Zacharias WJ, Frank DB, Zepp JA, et al. Regeneration of the lung alveolus by an evolutionarily conserved epithelial progenitor. Nature 2018; 555: 251–5.

65 Bui DS, Lodge CJ, Burgess JA, et al. Childhood predictors of lung function trajectories and future COPD risk: a prospective cohort study from the first to the sixth decade of life. Lancet Respir Med 2018; published online April 5. DOI:10.1016/S2213-2600(18)30100-0.

Page 20: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

66 McGeachie MJ, Yates KP, Zhou X, et al. Patterns of Growth and Decline in Lung Function in Persistent Childhood Asthma. N Engl J Med 2016; 374: 1842–52.

67 Ross JC, Castaldi PJ, Cho MH, et al. Longitudinal Modeling of Lung Function Trajectories in Smokers with and without COPD. Am J Respir Crit Care Med 2018; published online April 19. DOI:10.1164/rccm.201707-1405OC.

68 Hall NG, Klenotic P, Anand-Apte B, Apte SS. ADAMTSL-3/punctin-2, a novel glycoprotein in extracellular matrix related to the ADAMTS family of metalloproteases. Matrix Biol 2003; 22: 501–10.

69 Soler Artigas M, Loth DW, Wain L V, et al. Genome-wide association and large-scale follow up identifies 16 new loci influencing lung function. Nat Genet 2011; 43: 1082–90.

70 Yang J, Liu X, Yue G, Adamian M, Bulgakov O, Li T. Rootletin, a novel coiled-coil protein, is a structural component of the ciliary rootlet. J Cell Biol 2002; 159: 431–40.

71 Gibson MA, Hughes JL, Fanning JC, Cleary EG. The major antigen of elastin-associated microfibrils is a 31-kDa glycoprotein. J Biol Chem 1986; 261: 11429–36.

72 Massaro GD, Massaro D, Chan WY, et al. Retinoic acid receptor-beta: an endogenous inhibitor of the perinatal formation of pulmonary alveoli. Physiol Genomics 2000; 4: 51–7.

73 Wain L V, Shrine N, Miller S, et al. Novel insights into the genetics of smoking behaviour, lung function, and chronic obstructive pulmonary disease (UK BiLEVE): a genetic association study in UK Biobank. Lancet Respir Med 2015; 3: 769–81.

74 O’Leary NA, Wright MW, Brister JR, et al. Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Res 2016; 44: D733-45.

75 Saito A, Ozaki K, Fujiwara T, Nakamura Y, Tanigami A. Isolation and mapping of a human lung-specific gene, TSA1902, encoding a novel chitinase family member. Gene 1999; 239: 325–31.

76 Fagerberg L, Hallström BM, Oksvold P, et al. Analysis of the human tissue-specific expression by genome-wide integration of transcriptomics and antibody-based proteomics. Mol Cell Proteomics 2014; 13: 397–406.

77 Aminuddin F, Akhabir L, Stefanowicz D, et al. Genetic association between human chitinases and lung function in COPD. Hum Genet 2012; 131: 1105–14.

78 Birben E, Sackesen C, Kazani S, et al. The effects of an insertion in the 5’UTR of the AMCase on gene expression and pulmonary functions. Respir Med 2011; 105: 1160–9.

79 Chatterjee R, Batra J, Das S, Sharma SK, Ghosh B. Genetic association of acidic mammalian chitinase with atopic asthma and serum total IgE levels. J Allergy Clin Immunol 2008; 122: 202–8, 208.e1-7.

80 Ober C, Chupp GL. The chitinase and chitinase-like proteins: a review of genetic and functional studies in asthma and immune-mediated diseases. Curr Opin Allergy Clin Immunol 2009; 9: 401–8.

81 Heinzmann A, Brugger M, Bierbaum S, Mailaparambil B, Kopp M V, Strauch K. Joint influences of Acidic-Mammalian-Chitinase with Interleukin-4 and Toll-like receptor-10 with Interleukin-13 in the genetics of asthma. Pediatr Allergy Immunol 2010; 21: e679-86.

Page 21: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

82 Okawa K, Ohno M, Kashimura A, et al. Loss and Gain of Human Acidic Mammalian Chitinase Activity by Nonsynonymous SNPs. Mol Biol Evol 2016; 33: 3183–93.

83 Wu AC, Lasky-Su J, Rogers CA, Klanderman BJ, Litonjua A. Polymorphisms of chitinases are not associated with asthma. J Allergy Clin Immunol 2010; 125: 754–7, 757.e1–757.e2.

84 Yang S, Wang Y, Mei K, et al. Tumor necrosis factor receptor 2 (TNFR2)·interleukin-17 receptor D (IL-17RD) heteromerization reveals a novel mechanism for NF-κB activation. J Biol Chem 2015; 290: 861–71.

85 Pilewski JM, Latoche JD, Arcasoy SM, Albelda SM. Expression of integrin cell adhesion receptors during human airway epithelial repair in vivo. Am J Physiol 1997; 273: L256-63.

86 Markovics JA, Araya J, Cambier S, et al. Interleukin-1beta induces increased transcriptional activation of the transforming growth factor-beta-activating integrin subunit beta8 through altering chromatin architecture. J Biol Chem 2011; 286: 36864–74.

87 Kitamura H, Cambier S, Somanath S, et al. Mouse and human lung fibroblasts regulate dendritic cell trafficking, airway inflammation, and fibrosis through integrin αvβ8-mediated activation of TGF-β. J Clin Invest 2011; 121: 2863–75.

88 Araya J, Cambier S, Markovics JA, et al. Squamous metaplasia amplifies pathologic epithelial-mesenchymal interactions in COPD patients. J Clin Invest 2007; 117: 3551–62.

89 Boschetto P, Miniati M, Miotto D, et al. Predominant emphysema phenotype in chronic obstructive pulmonary. Eur Respir J 2003; 21: 450–4.

90 Castaldi PJ, Dy J, Ross J, et al. Cluster analysis in the COPDGene study identifies subtypes of smokers with distinct patterns of airway disease and emphysema. Thorax 2014; 69: 415–22.

91 Cerveri I, Corsico AG, Grosso A, et al. The rapid FEV(1) decline in chronic obstructive pulmonary disease is associated with predominant emphysema: a longitudinal study. COPD 2013; 10: 55–61.

92 Hersh CP, Make BJ, Lynch DA, et al. Non-emphysematous chronic obstructive pulmonary disease is associated with diabetes mellitus. BMC Pulm Med 2014; 14: 164.

93 Higami Y, Ogawa E, Ryujin Y, et al. Increased Epicardial Adipose Tissue Is Associated with the Airway Dominant Phenotype of Chronic Obstructive Pulmonary Disease. PLoS One 2016; 11: e0148794.

94 Smolonska J, Koppelman GH, Wijmenga C, et al. Common genes underlying asthma and COPD? Genome-wide analysis on the Dutch hypothesis. Eur Respir J 2014; 44: 860–72.

95 Ishaq M, Ma L, Wu X, et al. The DEAD-box RNA helicase DDX1 interacts with RelA and enhances nuclear factor kappaB-mediated transcription. J Cell Biochem 2009; 106: 296–305.

96 Cohen BH, Diamond EL, Graves CG, et al. A common familial component in lung cancer and chronic obstructive pulmonary disease. Lancet (London, England) 1977; 2: 523–6.

97 Young RP, Hopkins RJ, Whittington CF, Hay BA, Epton MJ, Gamble GD. Individual and cumulative effects of GWAS susceptibility loci in lung cancer: associations after sub-phenotyping for COPD. PLoS One 2011; 6: e16476.

98 Yang L, Lu X, Qiu F, et al. Duplicated copy of CHRNA7 increases risk and worsens prognosis of

Page 22: University of Manchester - Expanded genetic … · Web viewand has been well-known for over half a century. However, while a few additional rare variants have been discovered in COPD,

COPD and lung cancer. Eur J Hum Genet 2015; 23: 1019–24.

99 Vermaelen K, Brusselle G. Exposing a deadly alliance: novel insights into the biological links between COPD and lung cancer. Pulm Pharmacol Ther 2013; 26: 544–54.

100 Wang DC, Shi L, Zhu Z, Gao D, Zhang Y. Genomic mechanisms of transformation from chronic obstructive pulmonary disease to lung cancer. Semin Cancer Biol 2017; 42: 52–9.

101 El-Zein RA, Young RP, Hopkins RJ, Etzel CJ. Genetic predisposition to chronic obstructive pulmonary disease and/or lung cancer: important considerations when evaluating risk. Cancer Prev Res (Phila) 2012; 5: 522–7.

102 Takiguchi Y, Sekine I, Iwasawa S, Kurimoto R, Tatsumi K. Chronic obstructive pulmonary disease as a risk factor for lung cancer. World J Clin Oncol 2014; 5: 660–6.

103 Lee G, Walser TC, Dubinett SM. Chronic inflammation, chronic obstructive pulmonary disease, and lung cancer. Curr Opin Pulm Med 2009; 15: 303–7.

104 Young RP, Hopkins RJ. How the genetics of lung cancer may overlap with COPD. Respirology 2011; 16: 1047–55.