Summary Mendelian Randomization (SMR) - Practicals
Background
Statins, which are inhibitors of HMGCR, are cholesterol-lowering medications that are widely prescribed for treating cardiovascular diseases. There is conflicting evidence from small-scaled randomised controlled trials and observational studies on the potential anti-depressive effects of statins.
This inconsistency in findings may be due to differences in the chemical and pharmacokinetic properties of statin compounds studied, small sample sizes, varying follow-up times, unmeasured confounders and heterogeneity in depression pathophysiology. Mendelian randomisation (MR), a statistical genomic method that uses genetic instruments to proxy for drug exposure, is less prone to unmeasured confounder bias and reverse causation.
The validity of MR analysis relies on the genetic instruments meeting three key assumptions:
Genetic instruments are strongly associated with the exposure.
Genetics instruments are not associated with confounders.
Genetic instruments that they are associated with the outcome only through the exposure (no horizontal pleiotropy).
More details on these assumptions and the statistical methods available to test these assumptions can be found at Davies et al1 .
Aims
The aim of this practical is to use MR analysis to investigate the potential causal effects of HMGCR inhibition (the intended target statins) on depression risk and symptoms. This practical is a simplified version of the analysis presented in Jiang et al2. More specifically, this practical is divided into three parts, with each part addressing one of the aims below :
Identify suitable eQTLs as genetic instruments to proxy for the inhibition of HMGCR;
Validate the genetic instruments using control traits, which are known effects of statin use;
Investigate the association of genetically predicted HMGCR inhibition with depression risk and related traits.
Method Overview and Resources
We will perform the MR analysis using the SMR tool3, which is available here. We will use blood eQTLs of HMGCR, from the eQTLGen dataset (N = 31,684; available here), to proxy for HMGCR inhibition. We will use publicly available GWAS summary statistics for diverse disease and molecular traits to investigate the association between genetically predicted HMGCR inhibition and depression-related traits.
Overview of the Working Directory
Let’s first have a look at our data directory. This is the folder where we store all the data necessary to perform the analysis.
# Let's have a look at the content of this directory
ls /data/module6/Practicals/Practical3_MR_DrugEffects/
We can see that there are three sub-directories here:
Sub-directory name | Content |
---|---|
data | This directory contains the eQTL, LD reference and GWAS summary statistics datasets that we will use in the SMR analysis. NOTE: the README file in this directory contains the sources of each dataset, and the codes used to download and format these datasets. |
output_backup | This directory contains the results of the pre-run SMR analysis. They are used as backups and you can compare your own results against these files if you want. |
smr | This directory contains the SMR binary file that we will use in the SMR analysis. |
coloc | This directory contains the R script for performing the extension analysis using coloc. |
Before we get started with the analysis, let’s create an output directory where we will store the MR results.
# Let’s create the ouput directory
cd ~
mkdir ./SMR_HMGCR_prac
cd ./SMR_HMGCR_prac
mkdir ./output/
Analysis
Aim 1: Identification of genetic instruments
Before we do the MR analysis, we want to make sure that there are strong blood eQTLs for HMGCR. We can check this by querying (with the SMR tool) the eQTLGen dataset for any SNPs that are strongly associated with HMGCR expression.
# Copy and paste the following command to run an SMR query of the eQTLGen dataset
smr \
\
--beqtl-summary /data/module6/Practicals/Practical3_MR_DrugEffects/data/eQTLGen/eQTLGen_cis_eQTLs \
--query 5.0e-8 \
--gene ENSG00000113161 --out ./output/HMGCR_eQTLs
While SMR is running, let’s have a look at the specific flags in the command:
Once the analysis is completed, we can have a look at the results:
# We will have a look at the first nine SNPs
head ./output/HMGCR_eQTLs.txt
After the header, each line shows the association between one SNP and HMGCR expression in whole blood. Let’s focus on a few columns:
Column name | Content |
---|---|
SNP | The is the rsID of the SNP |
A1 | This is the effect allele |
A2 | This is the non-effect allele |
b | This is the effect size associated with the effect allele A1, which shows the direction an magnitude of the association between this SNP and HMGCR expression in whole blood |
se | This is the standard error for the effect size |
p | This is the p-value for the effect size |
As we can see, there are many SNPs that are strongly associated with HMGCR expression in whole blood, so we can proceed with the validation analysis.
Aim 2: Validation of genetic instruments using control traits
We need to make sure that the significant eQTLs of HMGCR can indeed be used to proxy for HMGCR inhibition (i.e. the effect of taking statin medications). To do this, we can perform MR analysis with control traits that have previously been linked to statin exposure. As statins are cholesterol-lowering medications, we will use LDL cholesterol levels as our outcome trait.
# Copy and paste the following command to run an SMR analysis with LDL cholesterol
smr \
\
--bfile /data/module6/Practicals/Practical3_MR_DrugEffects/data/LD_ref/1000G_phase3_20130502_combined_chr5 \
--gwas-summary /data/module6/Practicals/Practical3_MR_DrugEffects/data/GWAS_sumstats/Lipids_sumstats/formatted_LDL \
--beqtl-summary /data/module6/Practicals/Practical3_MR_DrugEffects/data/eQTLGen/eQTLGen_cis_eQTLs \
--gene ENSG00000113161 \
--diff-freq-prop 0.1 --out ./output/HMGCR_LDL_SMR
Again, while SMR is running, let’s have a look at the specific flags in the command:
Once the analysis is completed, we can have a look at the results:
# Reading the result file
cat ./output/HMGCR_LDL_SMR.smr
Let’s focus on a few columns:
Column name | Content |
topSNP | This is the SNP used as the genetic instrument for the SMR analysis |
A1 | This is the effect allele |
A2 | This is the non-effect allele |
b_SMR | This is the effect size, the estimated change in the outcome per 1 standard-deviation increase in gene expression |
se_SMR | This is the standard error for beta |
p_SMR | This is the p-value for beta |
p_HEIDI | This is the p-value for the HEIDI test1 |
Consider the following questions:
Is there a significant association between HMGCR expression and LDL cholesterol levels?
If so, in what direction? (hint: p_SMR < 0.05 defines statistical significance)
A p_HEIDI < 0.01 indicates that the observed association is mediated by two or more genetic variants in LD, instead of an association mediated by one single variant.
What is the p_HEIDI in this analysis?
What does it mean?
Does this corroborate existing evidence on the cholesterol-lowering effect of statins?
Extension questions:
Well done on finishing the practical! You have completed your first drug repurposing analysis using MR.
For the extension analysis, think about the following scenarios:
- Statins have previously been reported to show off-target inhibition of ITGAL and HDAC2. Design a workflow to investigate whether these off-target effects may mediate the potential anti-depressive effects of statins.
- By default, SMR chooses the strongest eQTL as the genetic instrument. An SNP named rs12916 has previously been used as a genetic instrument to proxy for HMGCR inhibition. Repeat the SMR analysis in the practical using rs12916 as the genetic instrument.
- The eQTL dataset used in this practical was profiled using whole blood. Considering that the brain is biologically more relevant to the aetiology of depression, we can perform a sensitivity analysis using eQTLs generated in brain tissues. Repeat the SMR analysis in the practical using eQTL data generated in the prefrontal cortex.
- Coloc is a tool that can be used to test whether two association datasets share a common causal variant4 . As a sensitivity analysis, run the coloc analysis between HMGCR expression in the brain and platelet counts.
# Run the coloc analysis
Rscript /data/module6/Practicals/Practical3_MR_DrugEffects/coloc/coloc.R
# Let’s take a look at the results
cat ./output/HMGCR_PlateletCount_coloc_summary.csv
- Pick a drug that you are interested in repurposing and design a workflow to investigate its repurposing potential to treat a disease.
References:
Davies NM, Holmes MV, Davey Smith G. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ. 2018;362:k601.
Jiang J-C, Hu C, McIntosh AM, Shah S. Investigating the potential anti-depressive mechanisms of statins: a transcriptomic and Mendelian randomization analysis. Transl Psychiatry. 2023;13:110.
Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, Montgomery GW, Goddard ME, Wray NR, Visscher PM, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48:481-487.
Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10:e1004383.
Footnotes
HEIDI = HEterogeneity In Dependent Instruments. HEIDI tests for heterogeneity of SNP effects in a locus, under the null hypothesis that there is only one causal SNP and the other significant SNPs are only significant due to LD with the causal SNP. p_HEIDI < 0.01 is a rejection of the null hypothesis that there is only one causal SNP, and suggests that there are multiple independent associations at this locus.↩︎