Summary Mendelian Randomization (SMR) - Practicals

Author

Dr Clara Jiang

Background

Statins, which are inhibitors of HMGCR, are cholesterol-lowering medications that are widely prescribed for treating cardiovascular diseases. There is conflicting evidence from small-scaled randomised controlled trials and observational studies on the potential anti-depressive effects of statins.

This inconsistency in findings may be due to differences in the chemical and pharmacokinetic properties of statin compounds studied, small sample sizes, varying follow-up times, unmeasured confounders and heterogeneity in depression pathophysiology. Mendelian randomisation (MR), a statistical genomic method that uses genetic instruments to proxy for drug exposure, is less prone to unmeasured confounder bias and reverse causation.

The validity of MR analysis relies on the genetic instruments meeting three key assumptions:

Genetic instruments are strongly associated with the exposure.
Genetics instruments are not associated with confounders.
Genetic instruments that they are associated with the outcome only through the exposure (no horizontal pleiotropy).

More details on these assumptions and the statistical methods available to test these assumptions can be found at Davies et al¹ .

Aims

The aim of this practical is to use MR analysis to investigate the potential causal effects of HMGCR inhibition (the intended target statins) on depression risk and symptoms. This practical is a simplified version of the analysis presented in Jiang et al². More specifically, this practical is divided into three parts, with each part addressing one of the aims below :

Identify suitable eQTLs as genetic instruments to proxy for the inhibition of HMGCR;
Validate the genetic instruments using control traits, which are known effects of statin use;
Investigate the association of genetically predicted HMGCR inhibition with depression risk and related traits.

Method Overview and Resources

We will perform the MR analysis using the SMR tool³, which is available here. We will use blood eQTLs of HMGCR, from the eQTLGen dataset (N = 31,684; available here), to proxy for HMGCR inhibition. We will use publicly available GWAS summary statistics for diverse disease and molecular traits to investigate the association between genetically predicted HMGCR inhibition and depression-related traits.

Overview of the Working Directory

Let’s first have a look at our data directory. This is the folder where we store all the data necessary to perform the analysis.

# Let's have a look at the content of this directory
ls /data/module6/Practicals/Practical3_MR_DrugEffects/

We can see that there are three sub-directories here:

Sub-directory name	Content
data	This directory contains the eQTL, LD reference and GWAS summary statistics datasets that we will use in the SMR analysis. NOTE: the README file in this directory contains the sources of each dataset, and the codes used to download and format these datasets.
output_backup	This directory contains the results of the pre-run SMR analysis. They are used as backups and you can compare your own results against these files if you want.
smr	This directory contains the SMR binary file that we will use in the SMR analysis.
coloc	This directory contains the R script for performing the extension analysis using coloc.

Before we get started with the analysis, let’s create an output directory where we will store the MR results.

# Let’s create the ouput directory
cd ~
mkdir ./SMR_HMGCR_prac
cd ./SMR_HMGCR_prac
mkdir ./output/

Analysis

Aim 1: Identification of genetic instruments

Before we do the MR analysis, we want to make sure that there are strong blood eQTLs for HMGCR. We can check this by querying (with the SMR tool) the eQTLGen dataset for any SNPs that are strongly associated with HMGCR expression.

# Copy and paste the following command to run an SMR query of the eQTLGen dataset
smr \
--beqtl-summary /data/module6/Practicals/Practical3_MR_DrugEffects/data/eQTLGen/eQTLGen_cis_eQTLs \
--query 5.0e-8 \
--gene ENSG00000113161 \
--out ./output/HMGCR_eQTLs

While SMR is running, let’s have a look at the specific flags in the command:

Once the analysis is completed, we can have a look at the results:

# We will have a look at the first nine SNPs
head ./output/HMGCR_eQTLs.txt

After the header, each line shows the association between one SNP and HMGCR expression in whole blood. Let’s focus on a few columns:

Column name	Content
SNP	The is the rsID of the SNP
A1	This is the effect allele
A2	This is the non-effect allele
b	This is the effect size associated with the effect allele A1, which shows the direction an magnitude of the association between this SNP and HMGCR expression in whole blood
se	This is the standard error for the effect size
p	This is the p-value for the effect size

As we can see, there are many SNPs that are strongly associated with HMGCR expression in whole blood, so we can proceed with the validation analysis.

Aim 2: Validation of genetic instruments using control traits

We need to make sure that the significant eQTLs of HMGCR can indeed be used to proxy for HMGCR inhibition (i.e. the effect of taking statin medications). To do this, we can perform MR analysis with control traits that have previously been linked to statin exposure. As statins are cholesterol-lowering medications, we will use LDL cholesterol levels as our outcome trait.

# Copy and paste the following command to run an SMR analysis with LDL cholesterol
smr \
--bfile /data/module6/Practicals/Practical3_MR_DrugEffects/data/LD_ref/1000G_phase3_20130502_combined_chr5 \
--gwas-summary /data/module6/Practicals/Practical3_MR_DrugEffects/data/GWAS_sumstats/Lipids_sumstats/formatted_LDL \
--beqtl-summary /data/module6/Practicals/Practical3_MR_DrugEffects/data/eQTLGen/eQTLGen_cis_eQTLs \
--gene ENSG00000113161 \
--diff-freq-prop 0.1 \
--out ./output/HMGCR_LDL_SMR

Again, while SMR is running, let’s have a look at the specific flags in the command:

Once the analysis is completed, we can have a look at the results:

# Reading the result file
cat ./output/HMGCR_LDL_SMR.smr

Let’s focus on a few columns:

Column name	Content
topSNP	This is the SNP used as the genetic instrument for the SMR analysis
A1	This is the effect allele
A2	This is the non-effect allele
b_SMR	This is the effect size, the estimated change in the outcome per 1 standard-deviation increase in gene expression
se_SMR	This is the standard error for beta
p_SMR	This is the p-value for beta
p_HEIDI	This is the p-value for the HEIDI test¹

Consider the following questions:

Question 1:

Is there a significant association between HMGCR expression and LDL cholesterol levels?

If so, in what direction? (hint: p_SMR < 0.05 defines statistical significance)

Question 2:

A p_HEIDI < 0.01 indicates that the observed association is mediated by two or more genetic variants in LD, instead of an association mediated by one single variant.

What is the p_HEIDI in this analysis?

What does it mean?

Does this corroborate existing evidence on the cholesterol-lowering effect of statins?

Aim 3: Analysis of the association between genetically predicted HMGCR inhibition and depression-related traits

To investigate the repurposing potential of statins for treating depression, we will repeat the SMR analysis with depression risk as an outcome.

# Copy and paste the following command to run an SMR analysis with depression risk
smr \
--bfile /data/module6/Practicals/Practical3_MR_DrugEffects/data/LD_ref/1000G_phase3_20130502_combined_chr5 \
--gwas-summary /data/module6/Practicals/Practical3_MR_DrugEffects/data/GWAS_sumstats/MD/formatted_MD \
--beqtl-summary /data/module6/Practicals/Practical3_MR_DrugEffects/data/eQTLGen/eQTLGen_cis_eQTLs \
--gene ENSG00000113161 \
--diff-freq-prop 0.1 \
--out ./output/HMGCR_MD_SMR

Once the analysis is completed, we can have a look at the results:

# Reading the result file
cat ./output/HMGCR_MD_SMR.smr

Consider the following questions:

Question 3:

Is there a significant association between HMGCR expression and depression risk?

If so, in what direction? (hint: p_SMR < 0.05 defines statistical significance)

Question 4:

What are some of the follow-up analyses that we can do?

Individuals with depression have been found to show increased platelet activity, with platelet activity reduced after antidepressant treatment. Interestingly, platelets are a major reservoir of serotonin in humans, and serotonin’s role in depression is supported by the efficacy of selective serotonin reuptake inhibitors as a treatment of depression.

We thus repeat the SMR analysis with platelet count as an outcome.

# Copy and paste the following command to run an SMR analysis with depression risk
smr \
--bfile /data/module6/Practicals/Practical3_MR_DrugEffects/data/LD_ref/1000G_phase3_20130502_combined_chr5 \
--gwas-summary /data/module6/Practicals/Practical3_MR_DrugEffects/data/GWAS_sumstats/Platelet/formatted_platelet_count_nodup \
--beqtl-summary /data/module6/Practicals/Practical3_MR_DrugEffects/data/eQTLGen/eQTLGen_cis_eQTLs \
--gene ENSG00000113161 \
--diff-freq-prop 0.1 \
--out ./output/HMGCR_PlateletCount_SMR

Once the analysis is completed, we can have a look at the results:

# Reading the result file
cat ./output/HMGCR_PlateletCount_SMR.smr

Consider the following questions:

Question 5:

Is there a significant association between HMGCR expression and platelet count?

If so, in what direction? (hint: p_SMR < 0.05 defines statistical significance)

Question 6:

A p_HEIDI < 0.01 indicates that the observed association is mediated by two or more distant genetic variants in LD.

What is the p_HEIDI in this analysis?

What does it mean?

Extension questions:

Well done on finishing the practical! You have completed your first drug repurposing analysis using MR.

For the extension analysis, think about the following scenarios:

Statins have previously been reported to show off-target inhibition of ITGAL and HDAC2. Design a workflow to investigate whether these off-target effects may mediate the potential anti-depressive effects of statins.

By default, SMR chooses the strongest eQTL as the genetic instrument. An SNP named rs12916 has previously been used as a genetic instrument to proxy for HMGCR inhibition. Repeat the SMR analysis in the practical using rs12916 as the genetic instrument.

The eQTL dataset used in this practical was profiled using whole blood. Considering that the brain is biologically more relevant to the aetiology of depression, we can perform a sensitivity analysis using eQTLs generated in brain tissues. Repeat the SMR analysis in the practical using eQTL data generated in the prefrontal cortex.

Hint

replace

“/data/module6/Practicals/Practical3_MR_DrugEffects/data/eQTLGen/eQTLGen_cis_eQTLs”

with

“/data/module6/Practicals/Practical3_MR_DrugEffects/data/PSYCHENCODE/PsychENCODE_cis_eqtl_HCP100_summary/Gandal_PsychENCODE_eQTL_HCP100+gPCs20_QTLtools”

Coloc is a tool that can be used to test whether two association datasets share a common causal variant⁴ . As a sensitivity analysis, run the coloc analysis between HMGCR expression in the brain and platelet counts.

# Run the coloc analysis
Rscript /data/module6/Practicals/Practical3_MR_DrugEffects/coloc/coloc.R
# Let’s take a look at the results
cat ./output/HMGCR_PlateletCount_coloc_summary.csv

Pick a drug that you are interested in repurposing and design a workflow to investigate its repurposing potential to treat a disease.

References:

Davies NM, Holmes MV, Davey Smith G. Reading Mendelian randomisation studies: a guide, glossary, and checklist for clinicians. BMJ. 2018;362:k601.
Jiang J-C, Hu C, McIntosh AM, Shah S. Investigating the potential anti-depressive mechanisms of statins: a transcriptomic and Mendelian randomization analysis. Transl Psychiatry. 2023;13:110.
Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, Powell JE, Montgomery GW, Goddard ME, Wray NR, Visscher PM, et al. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet. 2016;48:481-487.
Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD, Wallace C, Plagnol V. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLoS Genet. 2014;10:e1004383.

Footnotes

HEIDI = HEterogeneity In Dependent Instruments. HEIDI tests for heterogeneity of SNP effects in a locus, under the null hypothesis that there is only one causal SNP and the other significant SNPs are only significant due to LD with the causal SNP. p_HEIDI < 0.01 is a rejection of the null hypothesis that there is only one causal SNP, and suggests that there are multiple independent associations at this locus.↩︎