Meta-analysis of exome array data identifies six novel genetic loci for lung function

Jackson, Victoria E ORCID: 0000-0002-9758-9784, Latourelle, Jeanne C, Wain, Louise V ORCID: 0000-0003-4951-1867, Smith, Albert V, Grove, Megan L, Bartz, Traci M, Obeidat, Ma'en ORCID: 0000-0002-5443-2752, Province, Michael A, Gao, Wei, Qaiser, Beenish
et al (show 97 more authors) (2018) Meta-analysis of exome array data identifies six novel genetic loci for lung function. Wellcome Open Research, 3. p. 4.

Access the full-text of this item by clicking on the Open Access link.


<ns4:p><ns4:bold>Background:</ns4:bold> Over 90 regions of the genome have been associated with lung function to date, many of which have also been implicated in chronic obstructive pulmonary disease.</ns4:p><ns4:p> <ns4:bold>Methods:</ns4:bold> We carried out meta-analyses of exome array data and three lung function measures: forced expiratory volume in one second (FEV<ns4:sub>1</ns4:sub>), forced vital capacity (FVC) and the ratio of FEV<ns4:sub>1</ns4:sub> to FVC (FEV<ns4:sub>1</ns4:sub>/FVC). These analyses by the SpiroMeta and CHARGE consortia included 60,749 individuals of European ancestry from 23 studies, and 7,721 individuals of African Ancestry from 5 studies in the discovery stage, with follow-up in up to 111,556 independent individuals.</ns4:p><ns4:p> <ns4:bold>Results:</ns4:bold> We identified significant (P&lt;2·8x10<ns4:sup>-7</ns4:sup>) associations with six SNPs: a nonsynonymous variant in <ns4:italic>RPAP1</ns4:italic>, which is predicted to be damaging, three intronic SNPs (<ns4:italic>SEC24C, CASC17 </ns4:italic>and <ns4:italic>UQCC1</ns4:italic>) and two intergenic SNPs near to<ns4:italic> LY86 </ns4:italic>and <ns4:italic>FGF10.</ns4:italic> Expression quantitative trait loci analyses found evidence for regulation of gene expression at three signals and implicated several genes, including <ns4:italic>TYRO3</ns4:italic> and <ns4:italic>PLAU</ns4:italic>.</ns4:p><ns4:p> <ns4:bold>Conclusions: </ns4:bold>Further interrogation of these loci could provide greater understanding of the determinants of lung function and pulmonary disease.</ns4:p>

Item Type: Article
Additional Information: referee-status: Approved with reservations, Approved referee-response-29790: 10.21956/wellcomeopenres.13627.r29790, Rachel M. Freathy, Robin Beaumont, Institute of Biomedical and Clinical Science, University of Exeter, Exeter, UK, 25 Jan 2018, version 1, 1 approved, 1 approved with reservations referee-comment-3629: <b>Victoria Jackson</b>; <i>Posted: 12 Jun 2018</i>; We thank the reviewers for their helpful comments. We have addressed each specific comment below, and amended the manuscript correspondingly. <i>1. The phenotypes seem to have been adjusted for covariates and ancestry specific principal components prior to being inverse normally transformed. This transformation has the potential to introduce correlations between principal components and the inverse normally transformed phenotype ( Since one of the SNPs identified as being associated with the phenotype is known to vary in frequency across European populations, and the authors note that they cannot rule out the effects of population structure on the identified associations this raises concerns that some of the other associations could also be artefacts driven by failure to properly account for population stratification. It should explicitly be mentioned in the methods whether adjustments were made for ancestry specific principal components prior to inverse normal transforming the phenotype in the SpiroMeta Consortium component of the meta analysis or was included as a covariate in the phenotype - SNP association analysis.</i> In the SpiroMeta Consortium component of the analyses, adjustment for ancestry principal components (PCs) was not undertaken prior to transformation, rather PCs were adjusted for when fitting the SNP-trait associations. This is ambiguous in the text, and so we have amended the methods accordingly (Statistical analyses section, new wording below). Given that the adjustment for ancestry PCs was undertaken after phenotype transformation, we don't expect there to have been an introduction of correlation between the transformed trait and population structure. &quot;Traits were adjusted for sex, age, age <sup>2</sup> and height, and inverse normally transformed prior to association testing. For studies with unrelated individuals, SNP-trait associations were tested using linear models, with adjustments made for the first 10 ancestry principal components, whilst studies with related individuals utilised linear mixed models to account for familial relationships and underlying population structure.&quot; <i>2. Indeed, in the replication analysis in UK Biobank principal components were adjusted for prior to inverse normally transforming the data. Was genotyping chip adjusted for in this cohort (which should be done in the phenotype - SNP analysis)? The UKBiLEVE chip was enriched for smokers, which could affect association analyses unless chip is included as a covariate. In addition the interim data release (which seems to be what is used here - please clarify in the methods whether the data comes from the interim (2015) or full (2017) data release) featured some discrepancies between the two chips, which can introduce spurious associations especially if adjustment is not made for genotyping chip.</i> In the UK Biobank data, principal components (PCs) were adjusted for prior to transformation. As a sensitivity analysis, we have repeated the analysis for the six reported SNPs (the LCT SNP was not available in UK Biobank), transforming the phenotypes, and then adjusting for all covariates (including PCs) during the SNP-trait association test. For comparison, we have done this for all six SNPs with all three traits. Comparisons of these two analyses (not adjusted prior to transformation vs with adjustment prior to transformation) are shown here: For each SNP, the P-value comparison is highlighted for the trait we report the association with, and the dashed lines indicate the Bonferroni corrected significance threshold for independent replication (P&lt;1&middot;47&times;10 <sup>-3</sup>). Whilst there is a difference in the P-values for some SNP-trait combinations, (more significant P-values in the analysis with covariate adjustment prior to transformation for 5 of the 6 SNPs), the SNPs all meet the replication P-value threshold in both analyses. We have clarified in the methods (Study design, cohorts and genotyping section) that the UK Biobank data used was from the 2015 interim release. The UK Biobank analysis was stratified by smoking status (ever and never) and also chip (UK BiLEVE array and UK Biobank array). It was not clear from the methods previously that the analysis was stratified for chip, so we have now made this clear in the methods. We have also tested whether any of the six reported SNPs available in UK Biobank had different MAFs in the UK BiLEVE and UK Biobank samples (suggestive of a chip effect); however none showed evidence of this: <i>3. Why was raw trait used in CHARGE but inverse normalised in SpiroMeta Consortium? This seems an odd choice</i> We agree that using the raw trait in CHARGE and the transformed trait in SpiroMeta was not ideal; however it was not planned to combine the results of these consortia from the outset. By the time we had made the decision to combine the results from the two consortia, all studies had already completed analyses and it was not feasible for contributing studies to repeat the analyses with/out the transformation, as this would have involved a substantial amount of reanalysis from contributing studies. Since the effect estimates were not on the same scale we could not do an inverse variance weighted meta-analysis; therefore we did a P-value based meta-analysis. This analysis should be valid given that appropriate analyses were done within each consortium. <i>Minor concerns: 1. In the discussion, the authors mention that the 6 identified SNPs not attributed to population structure passed the Bonferroni significance threshold. They then mention that the SNPs ALSO pass Bonferroni corrected significance thresholds in the replication analysis. This could be misleading, since not all SNPs passed the Bonferroni threshold in the discovery only dataset.</i> We have reworded this section of the discussion as follows: &quot;There were six SNPs which reached P&lt;10 <sup>-5</sup> in the discovery stage meta-analysis of single variant associations, and subsequently met the Bonferroni corrected significance threshold for independent replication (P&lt;1&middot;47&times;10 <sup>3</sup>, corrected for 34 SNPs being tested). In the combined analyses of our discovery and replication analyses, these six SNPs met the exome chip-wide significance threshold (P&lt;2&middot;8&times;10 <sup>-7</sup>).&quot; <i>2. The authors mention that correction was made for genomic inflation statistic (λ), but we could not find the statistics relating to this. The figures should be given in the manuscript.</i> We have added Supplementary table 13 to the supplement. referee-response-30984: 10.21956/wellcomeopenres.13627.r30984, Lisa Strug, Naim Panjwani, Research Institute, Hospital for Sick Children, Toronto, ON, Canada, 04 Apr 2018, version 1, 1 approved, 1 approved with reservations referee-comment-3628: <b>Victoria Jackson</b>; <i>Posted: 12 Jun 2018</i>; Thank you the second set of reviewers for your helpful comments. Again, we have addressed specific points below, and and made appropriate amendments to the manuscript. <i>1. Two rare variant tests were chosen and applied to the data as opposed to choosing a combined test (e.g. Derkach et al 2013 Genetic Epidemiology). A combined test would be more powerful.</i> We agree, a combined test would have been the preferred choice for gene-based association testing. However, in this instance, the gene-based tests were chosen due to practical reasons, as SKAT and WST, the two tests utilised, were both implemented by the meta-analysis software used by the two contributing consortia (RAREMETAL and seqMeta). Since this was a meta-analysis, and only summary statistics were available for each study, the gene-based tests we were able to utilise were restricted to those implemented by these two software packages at the time of the meta-analyses. For example, the suggested method by Derkach et al. requires permutation to calculate P-values with adequately controlled type 1 errors, which would not have been possible with the summary statistics available. <i>2. The authors should explain why there was an inverse normalization of the traits in SpiroMeta but not in CHARGE, and provide some sensitivity analysis.</i> As mentioned in response to the other reviewers’ comments, we agree that using the raw trait in CHARGE and the transformed trait in SpiroMeta was not optimal; by the time we had made the decision to combine the results from the two consortia, all studies had already completed analyses, and reanalysis across the many cohorts would not have been feasible. <i>3. There appear to be very large differences in Effect Allele Frequencies between the discovery and replication samples. Do the authors have an explanation for this? This might point to local ancestry differences that could be relevant, and should be further investigated.</i> Thank you for highlighting this. There was an error with the effect allele frequencies for the replication samples in Supplementary Table 2; these have now been amended, and the allele frequencies are more consistent in the discovery and replication samples. Where there are still some differences between the discovery and replication allele frequencies, these are where the discovery meta-analysis included individuals of both European and African ancestry, whereas the replication dataset included individuals of European ancestry only. <i>4. The eQTL analysis could formally investigate colocalization as opposed to cross-referencing individual associated SNPs with public repositories, and there are several different methods that achieve this goal: e.g. COLOC, eCAVIAR, Sherlock, RTC or EnLoc.</i> Tests of colocalisation are more usually undertaken in dense genome-wide data, whereas the (often rare) putative causal variants included on the exome array in our study were relatively sparsely distributed. Furthermore, we did not have access to the lung eQTL data required to undertake a tests of colocalisation. We now acknowledge that the eQTL analysis did not include formal tests of colocalisation in the discussion, and in the example we highlight the variants are in complete LD. <i>5. In the replication analyses section, it is stated that “Traits were adjusted for age, age^2, height, sex, ten principal components and pack-years (ever smokers only), and inverse normally transformed.” For clarity, the authors should be specific about whether the trait (FEV1, FVC, or FEV1/FVC) was inverse normalized first and age, age^2, sex, 10 PCs were then added as covariates in the genetic association model.</i> We have clarified in the methods for the replication analysis that “Traits were adjusted for age, age2, height, sex, ten principal components and pack-years (ever smokers only), and the adjusted traits were inverse normally transformed.” <i>6. In the methods section for the rare variant testing Skat appears to be incorrectly referred to as a Fisher’s combined method.</i> Within each consortium we generated results for SKAT. Subsequently, we combined the SKAT results from the two consortia using Fisher’s Method for combing P-values. We have clarified this in the text as “For genes which contained at least 2 polymorphic SNPs in both consortia, we combined the results of the consortium level gene based tests using either z-score meta-analysis (for the WST analysis) or Fisher’s Method for combining P-values (in the case of SKAT).”. <i>7. The authors should provide the justification for their various significance criteria used in each of the analyses.</i> Justification for the SNPs and genes taken forward to the replication stage has now been added to the methods: “We identified SNPs of interest as those with an overall P&lt;10 -5 and a consistent direction of effect and P&lt;0&middot;05 observed in both consortia. Rather than using a strict Bonferroni correction for defining the significance threshold, we adopted the more lenient P&lt;10 -5 threshold in order to increase the power to detect variants with modest effect in our discovery analyses, whilst the requirement for consistency in results from the two consortia aimed to limit false positives. All SNPs meeting these thresholds were followed up in independent replicatizon cohorts.” “We identified genes of interest as those with P&lt;0&middot;05 observed in both consortia and an overall P&lt;10 -4, thresholds again chosen to limit both false positive and false negative findings.” The overall thresholds for the combined discovery and replication analyses were based on Bonferroni corrected thresholds, as already stated in the text. <i>8. The authors should list the MAF alongside the p-values reported in the text for clarity for the single variant analysis results</i> MAFs and P-values have now been added to the main text for all reported loci. grant-information: MDT has been supported by MRC fellowships G0501942 and G0902313. MDT and LVW are supported by the MRC (MR/N011317/1). IPH is supported by the MRC (G1000861). ALW and SJL are supported by the Intramural Research Program of the NIH, National Institute of Environmental Health Sciences (ZIA ES 043012). We acknowledge use of phenotype and genotype data from the British 1958 Birth Cohort DNA collection, funded by the Medical Research Council grant G0000934 and the Wellcome Trust grant 068545/Z/02. APM is a Wellcome Trust Senior Fellow in Basic Biomedical Science (grant number WT098017) and also supported by Wellcome Trust grant WT064890. EI is supported by the Swedish Research Council (2012-1397), Knut och Alice Wallenberg Foundation (2013.0126) and the Swedish Heart-Lung Foundation (20140422). JK is supported by Academy of Finland Center of Excellence in Complex Disease Genetics grants 213506, 129680 and Academy of Finland grants 265240, 263278. The Finnish Twin Cohort is supported by the Welcome Trust Sanger Institute, UK. The Lothian Birth Cohort is supported by Age UK (The Disconnected Mind Project), the UK Medical Research Council (MR/K026992/1) and The Royal Society of Edinburgh. ÅJ is supported by the Swedish Society for Medical Research (SSMF), The Kjell och Märta Beijers Foundation, The Marcus Borgström Foundation, The Åke Wiberg foundation and The Vleugels Foundation. UG is supported by Swedish Medical Research Council grants K2007-66X-20270-01-3 and 2011-2354 and European Commission FP6 (LSHG-CT-2006-01947). SHIP is part of the Community Medicine Research net of the University of Greifswald, Germany, which is funded by the Federal Ministry of Education and Research, the Ministry of Cultural Affairs as well as the Social Ministry of the Federal State of Mecklenburg-West Pomerania, and the network ‘Greifswald Approach to Individualized Medicine (GANI_MED)’ funded by the Federal Ministry of Education and Research, and the German Asthma and COPD Network (COSYCONET) (grant no.01ZZ9603, 01ZZ0103, 01ZZ0403, 03IS2061A, BMBF 01GI0883). ExomeChip data have been supported by the Federal Ministry of Education and Research (grant no. 03Z1CN22) and the Federal State of Mecklenburg-West Pomerania. The University of Greifswald is a member of the Caché Campus program of the InterSystems GmbH. UKHLS is supported by grants WT098051 (Wellcome Trust) and ES/H029745/1 (Economic and Social Research Council). Y.B. holds a Canada Research Chair in Genomics of Heart and Lung Diseases. Lies Lahousse is a Postdoctoral Fellow of the Research Foundation - Flanders (FWO grant G035014N). The Rotterdam Study is funded by Erasmus Medical Center and Erasmus University, Rotterdam, the Netherlands Organization for Scientific Research (NOW), the Netherlands Organization for the Health Research and Development (ZonMw), the Research Institute for Diseases in the Elderly (RIDE), the Ministry of Education, Culture and Science, the Ministry for Health, Welfare and Sports, the European Commission (DG XII), and the Municipality of Rotterdam. Genotyping in the Rotterdam study was supported by Netherlands Organization for Scientific Research (NOW grants 175.010.2005.011 ; 911-03-305 012), the Research Institute for Diseases in the Elderly (RIDE2 grants 014-93-015) and Netherlands Genomics Initiative (NGI)/Netherlands Consortium for Healthy Aging (NCHA grant050-060-810). MESA/MESA SHARe is supported by HHS (HHSN268201500003I), NIH/NHLBI (contracts N01-HC-95159, N01-HC-95160, N01-HC-95161, N01-HC-95162, N01-HC-95163, N01-HC-95164, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169) and HIH/NCATS (contracts UL1-TR-000040, UL1-TR-001079, UL1-TR-001881, DK063491). MESA SHARe is funded by NIH/NHLBI contract N02-HL-64278, MESA Air is funded by US EPA (RD831697) and MESA Spirometry funded by NIH/NHLBI (R01-HL077612). SSR and BMP are supported by NIH/NHLBI grant rare variants and NHLBI traits in deeply phenotyped cohorts (R01-HL120393). Cardiovascular Health Study: This CHS research was supported by NHLBI contracts HHSN268201200036C, HHSN268200800007C, N01HC55222, N01HC85079, N01HC85080, N01HC85081, N01HC85082, N01HC85083, N01HC85086; and NHLBI grants U01HL080295, R01HL068986, R01HL087652, R01HL105756, R01HL103612, R01HL120393, and R01HL130114 with additional contribution from the National Institute of Neurological Disorders and Stroke (NINDS). Additional support was provided through R01AG023629 from the National Institute on Aging (NIA). The provision of genotyping data was supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR000124, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. The Atherosclerosis Risk in Communities (ARIC) study is carried out as a collaborative study supported by the National Heart, Lung, and Blood Institute (NHLBI) contracts (HHSN268201100005C, HHSN268201100006C, HHSN268201100007C, HHSN268201100008C, HHSN268201100009C, HHSN268201100010C, HHSN268201100011C, and HHSN268201100012C). Funding support for “Building on GWAS for NHLBI-diseases: the U.S. CHARGE consortium” was provided by the NIH through the American Recovery and Reinvestment Act of 2009 (ARRA) (5RC2HL102419). DOMK received funding from the Dutch Science Organisation (ZonMW-VENI Grant 916.14.023). The genotyping in the NEO study was supported by the Centre National de Génotypage (Paris, France), headed by Jean-François Deleuze. The NEO study is supported by the participating Departments, the Division and the Board of Directors of the Leiden University Medical Center, and by the Leiden University, Research Profile Area Vascular and Regenerative Medicine. SAPALDIA was supported by the Swiss National Science Foundation (grants no 33CS30-148470/1, 33CSCO-134276/1, 33CSCO-108796, , 324730_135673, 3247BO-104283, 3247BO-104288, 3247BO-104284, 3247-065896, 3100-059302, 3200-052720, 3200-042532, 4026-028099, PMPDP3_129021/1, PMPDP3_141671/1), the Federal Office for the Environment, the Federal Office of Public Health, the Federal Office of Roads and Transport, the canton's government of Aargau, Basel-Stadt, Basel-Land, Geneva, Luzern, Ticino, Valais, and Zürich, the Swiss Lung League, the canton's Lung League of Basel Stadt/ Basel Landschaft, Geneva, Ticino, Valais, Graubünden and Zurich, Stiftung ehemals Bündner Heilstätten, SUVA, Freiwillige Akademische Gesellschaft, UBS Wealth Foundation, Talecris Biotherapeutics GmbH, Abbott Diagnostics, European Commission 018996 (GABRIEL), Wellcome Trust WT 084703MA. The Novo Nordisk Foundation Center for Basic Metabolic Research is an independent Research Center at the University of Copenhagen partially funded by an unrestricted donation from the Novo Nordisk Foundation ( Generation Scotland received core support from the Chief Scientist Office of the Scottish Government Health Directorates [CZD/16/6] and the Scottish Funding Council [HR03006]. Genotyping of the GS:SFHS samples was carried out by the Genetics Core Laboratory at the Edinburgh Clinical Research Facility, University of Edinburgh, Scotland and was funded by the Medical Research Council UK.. The Croatia KORCULA study was supported by the Ministry of Science, Education and Sport in the Republic of Croatia (108-1080315-0302). JD, JCL, WG and GTOC are supported by NIH/NHLBI Contract HHSN268201500001I. Genotyping, quality control and calling of the Illumina HumanExome BeadChip in the Framingham Heart Study was supported by funding from the National Heart, Lung and Blood Institute Division of Intramural Research (Daniel Levy and Christopher J. O’Donnell, Principle Investigators). The AGES study is supported by the NIH (N01-AG012100), the Iceland Parliament (Alþingi) and the Icelandic Heart Association. HABC was supported by NIA contracts N01AG62101, N01AG62103, and N01AG62106; NIA grant R01-AG028050, and NINR grant R01- NR012459 and was supported in part by the Intramural Research Program of the NIH, National Institute on Aging. The HABC genome-wide association study was funded by NIA grant 1R01AG032098- 01A1 and genotyping services were provided by the Center for Inherited Disease Research (CIDR). CIDR is fully funded through a federal contract from the National Institutes of Health to The Johns Hopkins University, contract number HHSN268200782096C. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. copyright-info: This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
Depositing User: Symplectic Admin
Date Deposited: 08 May 2019 11:35
Last Modified: 19 Jan 2023 00:52
DOI: 10.12688/wellcomeopenres.12583.1
Open Access URL:
Related URLs: