Novel blood pressure locus and gene discovery using gwas and expression datasets from blood and the kidney



Yüklə 0,75 Mb.
səhifə2/6
tarix22.07.2018
ölçüsü0,75 Mb.
#57932
1   2   3   4   5   6

ABSTRACT

Elevated blood pressure is a major risk factor for cardiovascular disease and has a substantial genetic contribution. Genetic variation influencing blood pressure has the potential to identify new pharmacological targets for the treatment of hypertension. To discover additional novel blood pressure loci, we used 1000 Genomes Project-based imputation in 150,134 European ancestry individuals and sought significant evidence for independent replication in a further 228,245 individuals. We report 6 new signals of association in or near HSPB7, TNXB, LRP12, LOC283335, SEPT9 and AKT2, and provide new replication evidence for a further 2 signals in EBF2 and NFKBIA. Combining large whole-blood gene expression resources totaling 12,607 individuals, we investigated all novel and previously reported signals and identified 48 genes with evidence for involvement in BP regulation that are significant in multiple resources. Three novel kidney-specific signals were also detected. These robustly implicated genes may provide new leads for therapeutic innovation.


INTRODUCTION

Genetic support for a drug target increases the likelihood of success in drug development 1 and there is clear unmet need for novel therapeutic strategies to treat individuals with hypertension 2. A number of large studies have described blood pressure (BP) variant identification by genome-wide and targeted association approaches 3-19. Clinically the most predictive BP traits for cardiovascular risk are systolic blood pressure (SBP) and diastolic blood pressure (DBP), reflecting roughly the peak and trough of the BP curve, and pulse pressure (PP), the difference between SBP and DBP 20 reflecting arterial stiffness. Using these three traits, we undertook a meta-analysis of 150,134 individuals from 54 genome-wide association studies of European ancestry with imputation based on the 1000 Genomes Project Phase 1. To minimize reporting of false positive associations, we sought stringent evidence for significant independent replication in a further 228,245 individuals. We further followed up novel and previously reported association signals in multiple large gene expression databases and the largest kidney tissue gene expression resource currently available. Finally, we searched for enrichment of associated genes in biological pathways and gene sets and identified whether any of the genes were known drug targets or had tool molecules.



MATERIALS AND METHODS

Studies Stage 1

Results from 54 independent European-ancestry studies, totaling 150,134 individuals, were included in the Stage 1 meta-analysis: AGES (n=3215), ARIC (n=9402), ASPS (n=828), B58C (n=6458), BHS (n=4492), CHS (n=3254), Cilento study (n=999), COLAUS (n=5404), COROGENE-CTRL (n=1878), CROATIA-Vis (n=945), CROATIA-Split (n=494), CROATIA-Korcula (n=867), EGCUT (n=6395), EGCUT2 (n=1844), EPIC (n=2100), ERF (n=2617), Fenland (n=1357), FHS (n=8096), FINRISK-ctrl (n=861), FINRISK CASE (n=839), FUSION (n=1045), GRAPHIC (n=1010), H2000-CTRL (n=1078), HealthABC (n=1661), HTO (n=1000), INGI-CARL (n=456), INGI-FVG (n=746), INGI-VB (n=1775), IPM (n=300), KORAS3 (n=1590), KORAS4 (n=3748), LBC1921 (n=376), LBC1936 (n=800), LOLIPOP-EW610 (n=927), MESA (n=2678), MICROS (n=1148), MIGEN (n=1214), NESDA (n=2336), NSPHS (n=1005), NTR (n=1490), PHASE (n=4535), PIVUS (n=945), PROCARDIS (n=1652), SHIP (n=4068), ULSAM (n=1114), WGHS (n=23049), YFS (n=1987), ORCADES (n=1908), RS1 (n=5645), RS2 (n=2152), RS3 (n=3018), TRAILS (n=1262), TRAILS-CC (n=282) and TWINGENE (n=9789). Full study names and general study information is given in Supplementary Table 1.


Study-level genotyping and association testing

Three quantitative BP traits were analyzed: SBP, DBP, and PP (difference between SBP and DBP). Within each study, individuals known to be taking anti-hypertensive medication had 15 mmHg added to their raw SBP value and 10 mmHg added to their raw DBP values 21. A summary of BP phenotypes in each study is given in Supplementary Table 2. Association testing was undertaken according to a central analysis plan that specified the use of sex, age, age2, and body mass index (BMI) as covariates and optional inclusion of additional covariates to account for population stratification (Supplementary Table 3). Trait residuals were calculated for each trait using a normal linear regression of the medication-adjusted trait values (mmHg) onto all covariates. The genotyping array, pre-imputation quality control filters, imputation software and association testing software used by each study are listed in Supplementary Table 4. All studies imputed to the 1000 Genomes Project Phase 1 integrated release version 3 [March 2012] all ancestry reference panel 22. Imputed genotype dosages were used to take into account uncertainty in the imputation. Association testing was carried out using linear regression of the trait residuals onto genotype dosages under an additive genetic model. Methods to account for relatedness within a study were used where appropriate (Supplementary Table 3). Results for all variants (SNPs and INDELs) were then returned to the central analysis group for further quality control checks and meta-analysis.



Stage 1 meta-analysis

Central quality control checks were undertaken across all results sets. This included checks to ensure allele frequency consistency (across studies and with reference populations), checks of effect size and standard error distributions (i.e. to highlight phenotype issues) and generation of quantile-quantile (QQ) plots and genomic inflation factor lambdas to check for over- or under-inflation of test statistics. Genomic control was applied (if lambda>1) at study-level. Variants with imputation quality <0.3 were excluded prior to meta-analysis. Inverse variance weighted meta-analysis was undertaken. After meta-analysis, variants with a weighted minor allele frequency of less than 1 % or N effective (product of study sample size and imputation quality summed across contributing studies) <60% were then excluded and meta-analysis genomic control lambda calculated and used to adjust the meta-analysis results.



Selection of regions for follow-up

For each trait, regions of association were selected by ranking variants by P value, recording the variant with the lowest P value as a sentinel variant and then excluding all variants +/-500kb from the sentinel and re-ranking the remaining variants. This was undertaken iteratively until all sentinel variants representing 1Mb regions containing associations with P <10-6 had been identified. To identify additional signals represented by secondary sentinel variants within 500kb of each of the sentinel variants, GCTA 23 was used to run conditional analyses (conditioned on the first sentinel variant) on each of the 1Mb regions using GWAS summary statistics and LD information from ARIC. This was done both for putatively novel regions and for regions that had previously been reported. A chi-squared test of heterogeneity of effect sizes across the 54 studies was run for each sentinel variant and those with P <0.05 for heterogeneity were excluded from further follow-up. Variants with P <10-6 after conditioning on the sentinel SNP (novel or known) in the region and for which any attenuation of the –log10 P value was less than 1.5 fold, were also taken forward for replication.



Studies stage 2

Data from 14 independent studies, totaling 87,360 individuals, and the first release of UK Biobank, totaling 140,886 individuals, were combined to replicate the findings from stage 1 (i.e. totaling 228,245 individuals). Stage 2 study details, including full study names, are given in Supplementary Table 6 and included 3C-Dijon (n=4061), Airwave (n=14023), ASCOT-SC (n=2462), ASCOT-UK (n=3803), BRIGHT (n=1791), GAPP (n=1685), GoDARTs (n=7413), GS:SFHS (n=9749), HCS (n=2112), JUPITER (n=8718), LifeLines (n=13376), NEO (n=5731), TwinsUK (n=4973), UK Biobank-CMC (n=140,886) and UKHLS (n=7462). Analysis was undertaken using the same methods as described for Stage 1 studies. UK Biobank-CMC utilized a newer imputation reference panel than the other studies and where a requested variant was not available, a proxy was used (next most significant P value with linkage disequilibrium r2> 0.6 with original top variant). Results from all stage 2 studies were meta-analyzed using inverse-variance weighted meta-analysis. Two of the variants, rs1048238 and chr1:243458005:I, were not available in the largest study in Stage 2 (UK Biobank-CMC) and so proxy variants were selected (based on P value and LD).



Stage 1 + Stage 2 meta-analysis

Following meta-analysis of stage 1 and stage 2 results, signals with a P > 5x10-8 were excluded. Of the signals with a final P <5x10-8, support for independent replication within the stage 2 studies only was sought. Any signals which had P < 5x10-8 and evidence for independent replication in stage 2 alone, indicated by P < 8.2x10-4 (Bonferroni correction for 61 tests) were reported as novel signals of association with BP. Any signals which were subsequently reported by other BP GWAS that were accepted for publication during the time this analysis was ongoing, or signals for which independence from another known signal could not be established, were removed from our list of novel signals at this stage (Supplementary Table 5).



Genotype and gene expression

We searched for signals of association of genotype with gene expression for the 22 signals (including 8 novel) signals described in this study (Supplementary Table 7) and all signals reported prior to our study (Supplementary Table 10) 3-17, 24 in 3 whole-blood data sets, 1 kidney data set and the GTEx multiple tissue data resource, which included whole-blood 25. We selected cis signals of association which were significant after controlling for 5% False Discovery Rate (FDR). The 3 whole-blood eQTL data sets were the NHLBI Systems Approach to Biomarker Research in Cardiovascular Disease initiative whole-blood eQTL resource (SABRe) (microarray, n=5257), NESDA-NTR (microarray, n=4896), BIOS (RNAseq, n=2116). The whole-blood data from GTEx was based on data from 338 samples. The kidney data set comprised 236 donor-kidney samples from 134 donors 26. Full details of each data set can be found in the Supporting Information. The source transcriptomic renal data as described26 have been deposited in the GeneExpression Omnibus (NCBI) and are accessible online through GEO Series accession number GSE43974.



LD lookup

The 1000 Genomes Project phase 3 release of variant calls was used (Feb. 20th, 2015), using 503 subjects of European ancestry22. r2 between the sentinel SNPs and all other bi-allelic SNPs within the corresponding 2 Mb area was calculated using the Tabix and PLINK software package (v1.07) 27, 28. Annotation was performed using the ANNOVAR software package29.



Gene-based pathway analysis

All genes identified in 3 or 4 of the whole-blood eQTL resources above (Table 2), and genes containing a non-synonymous variant with r2>0.5 with the sentinel variant (Supplementary Table 13), were tested for enrichment of biological pathways and gene ontology terms using ConsensusPathDB 30 using a FDR<5% cut-off. Enriched pathways and GO terms containing genes only implicated by a single BP-associated variant were not reported.



Network analysis

To construct a functional association network, we combined two prioritized candidate gene sets into a single query gene set as (i) genes mapping to the non-synonymous SNPs (nsSNPs) in high LD (r2>0.5) with the corresponding sentinel BP associated SNP, and (ii) genes with eQTL evidence from 3 or 4 of the blood eQTL resources. Three sentinel SNPs (rs185819, rs926552 and rs805303) mapping to the HLA region on chromosome 6 were excluded from downstream analyses. The single query gene set was then used as input for the functional network analysis31. We used the Cytoscape 32 software platform extended by the GeneMANIA33 plugin (Data Version: 8/12/2014)34. All the genes in the composite network, either from the query or the resulting gene sets, were then used for functional enrichment analysis against Gene Ontology terms (GO terms) 35 to identify the most relevant GO terms using the same plugin 34.



DNase1 Hypersensitivity overlap enrichment across tissue and cell-types

The Functional element Overlap analysis of the Results of Genome Wide Association Study (GWAS) Experiments (Forge tool v1.1)36 was used to test for enrichment of overlap of BP SNPs in tissues and cell lines from the Roadmap and ENCODE projects. All 164 SNPs were entered and 143 were included in the analysis. SNPs from 9 commonly used GWAS arrays were used to select background sets of SNPs for comparison and 10,000 background repetitions were run. A Z-score threshold of >=3.39 (estimated false positive rate of 0.5%) was used to declare significance.



Drug-gene interactions

Genes used for pathway and gene ontology enrichment analyses were further investigated for potential druggable or drugged targets using the drug gene interaction database (DGIdb). Known drug-gene interactions were interrogated across 15 source databases in DGIdb and include all types of interactions. The analysis performed for druggability prediction included all 9 databases exclusively inspecting expert curated data only. We also evaluate genes for known tool compounds using Chembl (www.ebi.ac.uk/chembl/; version 22.1).



RESULTS

The stage 1 discovery meta-analysis included 150,134 individuals (Online Methods; Supplementary Tables 1-4, Supplementary Figures 1 and 2) and 7,994,604 variants with minor allele frequency (MAF) >1% and an effective sample size of at least 60% of the total (Online Methods). We identified 61 signals in the discovery analysis that were candidates for novel BP signals (P < 10-6 for any trait; Supplementary Table 5). To ensure robustness of signals, we examined BP associations in an additional 228,245 individuals from 15 independent studies for replication, including 140,886 individuals from UK Biobank 19 (Supplementary Table 6 and Online Methods). We used the most significant (“sentinel”) SNP and trait for each locus in replication (61 tests). Twenty-two putatively novel association signals were initially confirmed showing significant evidence of replication in the independent stage-2 studies (P < 8.2x10-4, Bonferroni correction for 61 tests) and genome-wide significance (P < 5x10-8) in a meta-analysis across all 378,376 individuals (Online methods, Table 1, Supplementary Table 7). Of these, 14 were subsequently published in two other studies 18, 19 which presented genome-wide significant associations with evidence of replication. A further two were highlighted as putative novel signals in one of those studies 18 but had not been confirmed by replication. In our study, we report the 6 remaining novel signals, and the 2 previously unconfirmed signals (in EBF2 and in NFKBIA), as novel signals. The 8 novel signals included 7 signals at 7 independent loci (Supplementary Figure 3) and one novel independent signal near a previously reported hit near TNXB (Online Methods, Supplementary Table 8, Supplementary Figure 4). The novel signals show both significant evidence of replication in the independent stage-2 studies (P < 8.2x10-4, Bonferroni correction for 61 tests) and genome-wide significance (P < 5x10-8) in a meta-analysis across all 378,376 individuals. The sentinel variants at all 8 signals were common (MAF>5%) and the novel secondary signal at TNXB was in high linkage-disequilibrium (r2 > 0.8) with a non-synonymous SNP. With the exception of rs9710247, which was only significant for association with DBP, all signals were significantly associated (P<0.006, Bonferroni corrected for 8 tests) with all 3 traits (Table 1 and Supplementary Table 9).

We next sought to identify which genes might have expression levels that were associated with genotypes of the BP-associated variants reported in this study and others. Strong evidence of an association with expression of a specific gene may provide clues as to which gene(s) might be functionally relevant to that signal. We took the 139 BP association signals reported prior to these studies 18, 19, and 22 novel signals of association identified and confirmed in this study and two contemporaneous studies 3-19, 24 (Supplementary Table 10), and searched for evidence of association with gene expression in whole-blood (four studies, total n=12,607; Online Methods) and in kidney tissue (n=134, the largest kidney eQTL resource currently available). Although of unclear direct relevance to BP, whole-blood was studied due to the availability of large data sets enabling a powerful assessment of expression patterns that are likely present across multiple cell and tissue types. Similarly, circulating blood cells have been used for ion transport experiments in the past and altered ion transport levels in erythrocytes were linked to hypertension.37 Kidney was chosen because of the many renal pathways that regulate BP and outstanding questions about the relevance of kidney pathways to the genetic component of BP regulation in the general population 3, 15. Expression quantitative trait loci (eQTL) signals were filtered by false discovery rate (FDR<5%) and we examined cis (within 1Mb) associations only (Online methods and Supporting Information).

The four blood eQTL data sets were NESDA-NTR 38, 39, SABRe 15, the BIOS resource 40 and GTEx25 (Online Methods and Supporting Information). The BIOS resource (n=2,116) has not previously been utilized in the analysis of BP associations, findings from NESDA-NTR and SABRe have been reported for a subset of the previously published signals 16, 17. For a total of 369 genes, gene-expression was associated with the BP SNP in one or more of the 4 blood datasets at experiment-wide significance (Supplementary Table 11). This included 14 genes for 6 of the 8 novel signals. For 110 genes, we found eQTL evidence in 2 out of 4 datasets (Figure 1), including 4 genes for 2 of the novel signals; EIF4B and TNS2 for rs73099903 and MAP3K10 and PLD3 for rs9710247. SNP rs73099903 was in strong linkage disequilibrium (LD r2>0.9) with the SNP most strongly associated with TNS2 expression in the BIOS resource. TNS2 encodes a tensin focal adhesion molecule and may have a role in renal function 41.

For 48 genes, we found evidence in 3 out of the 4 resources (Table 2), suggesting robustness of the SNP-gene expression correlation signal and highlighting those genes as potential candidates in genetic BP regulation. Of the 48 genes, 28 have not previously been described in eQTL analyses using BP associated SNPs and all were correlated with previously reported BP association signals.

In the kidney dataset (TransplantLines) 26, there was association of gene expression and genotype for nine SNPs and 13 genes (Table 2, Figure 1 and Supplementary Table 12). Nine of the SNP-gene expression associations were also observed in the whole-blood eQTL datasets, suggesting that those signals may not be unique to the kidney. We report three signals that were unique to the kidney and not previously reported (C4orf34, HIP2 and ASIC1) and confirm a previously reported kidney eQTL signal for an anti-sense RNA for PSMD5 15. The same SNP was also an eQTL for PSMD5 itself in both blood and kidney. ASIC1 encodes the Acid Sensing Ion Channel Subunit 1 which may interact (and be co-expressed) with ENaC subunits which mediate trans-epithelial Na transport in the distal nephron of the kidney 42. The comparatively small number of signals using kidney tissue (Table 2 and Figure 1) compared to whole-blood could be due to the small sample size. Complete GTEx results are in Supplementary Table 13.

For genes implicated by eQTL information from whole-blood, we tested for enrichment of biological pathways and gene ontologies (Online Methods). We noted enrichment of the 48 genes implicated by 3 or 4 blood eQTL resources, Table 2, and a further 53 genes containing a non-synonymous variant with r2 > 0.5 with the top SNP (Supplementary Table 14), in pathways and ontology terms related to actin and striated muscle (Supplementary Tables 15 and 16, Online Methods). Network analysis using the same genes highlighted further GO terms relating to muscle function, particularly cardiac muscle (Online Methods, Supplementary Table 17). We tested the overlap of 161 non-HLA BP associated variants with DNase Hypersensitivity sites identified in the Roadmap and ENCODE cell lines (Online Methods) and identified an overall enrichment in multiple cell and tissue types including heart, kidney and smooth muscle (Supplementary Figure 5).

We next investigated these genes for potential suitability as drug targets (druggability), known tool compounds and clinically approved drugs using the drug gene interaction database (DGIdb) 43 (Supplementary Table 18). Twelve genes had known drugs, including four genes with known antihypertensive drugs. We noted that drugs modulating all but one of the 12 drugged targets had a reported influence on blood pressure, either as a primary antihypertensive indication or as a reported side effect of raised blood pressure. Twenty additional genes were predicted druggable, among these 7 genes have known small molecule tool modulators, based on a query of the Chembl database (www.ebi.ac.uk/chembldb/; version 22.1).



DISCUSSION

Enhanced discovery of BP loci increases the potential targets for therapeutic advances. After major advances in the number of BP loci known over the last years and months, we report 8 novel signals that implicate 5 regions of the genome not previously connected to blood pressure regulation.

Six of the 8 novel signals we report had not previously been reported. Two signals (in EBF2 and NFKBIA) have been suggested previously but without evidence for replication 18. For these two signals we present, for the first time, stringent evidence of replication, confirming their relevance to blood pressure genetics.

The path from signal to genes is the essential next step towards realizing the therapeutic potential of a genetic locus and understanding the mechanisms of BP regulation. We have used several large eQTL resources as a first step to realize this objective. As expected, we observed that even across eQTL studies of the same tissue, there is limited overlap in experiment-wide significant signals suggesting either biologic variability (differences in the characteristics of the samples or in the methods for extraction and processing of mRNA in each of the studies), technology-specific differences in coverage of genes (use RNAseq data for the BIOS blood dataset and microarray-based expression levels for the kidney and other blood datasets), or the possibility of false positive results despite stringent within-experiment significance thresholds. We were unable to distinguish these scenarios using the data available to us but by selecting genes that were significant in at least three resources, and therefore robust to these differences, we identified 48 genes as candidates for further study. These results are limited by the availability of large eQTL resources for whole-blood only, which precludes well-powered comparisons across tissue types, particularly as the origin of blood pressure control is unlikely to be located in the blood. Enrichment and pathway analyses using these genes, and genes containing a correlated functional variant, highlight the potential relevance of muscular tissue and pathways, compatible with a vascular and cardiac origin of BP genetics, extending previous evidence 15. We identify a number of drugged targets in the pathways identified, including four existing hypertension targets. Other drugs identified are not suitable candidates for repositioning to hypertension, as most were reported in adverse events to raise blood pressure, however the targets would be valid for investigation using a reverse mechanism, e.g. agonism in place of inhibition. We also identified seven genes with small molecule tool modulators (mainly inhibitory or binding). These molecules and targets might be suitable candidates for further investigation to build a target validation case to support clinical investigation in hypertension.

Amongst the genes implicated in our eQTL analyses were several for which there is already some evidence that they are relevant to blood pressure regulation. The intronic SNP rs10926988 was independently associated with expression of SDCCAG8 in all four whole-blood resources. Rare mutations in SDCCAG8 cause Bardet-Biedl syndrome, which features hypertension. Expression levels of MYBPC3 were correlated with rs710364815 in the 3 largest blood eQTL resources (i.e. SABRe, NESDA-NTR and BIOS). MYBPC3 encodes the cardiac isoform of myosin-binding protein C, which is expressed in heart muscle and mutations in MYBPC3 are known to cause familial hypertrophic cardiomyopathy.44

This study has several limitations: Given the nature of statistical power for genome-wide association analyses, the sample size is limited, even though this is one of the largest efforts in BP GWAS undertaken so far. The study would clearly have benefited from the availability of larger eQTL resources on multiple tissues in sample sizes even larger than those available today. Our analyses were limited to cis-signals and future analyses, with larger sample sizes, might also consider trans-signals.

In summary, our study reports novel BP association signals and reports new candidate BP genes, contributing to the transition from variants to genes to explain BP variation.


Yüklə 0,75 Mb.

Dostları ilə paylaş:
1   2   3   4   5   6




Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©www.genderi.org 2024
rəhbərliyinə müraciət

    Ana səhifə