SMR

Summary-data-based Mendelian Randomization

Overview

This software tool implements a method to test for association between gene expression and a complex trait because of pleiotropy or causality using summary-level data from GWAS and expression quantitative trait loci (eQTL) studies (Zhu et al. 2016 Nat Genet). It provides a useful tool to prioritize genes underlying GWAS hits for follow-up functional studies. The software is developed by Futao Zhang, Zhihong Zhu and Jian Yang at Queensland Brain Institute, The University of Queensland. Software tool and eQTL summary data are available in Download. Bug reports or questions: jian.yang@uq.edu.au

Tutorial

Running SMR using summary-level statistics from GWAS and eQTL studies

smr --bfile mydata --gwas-summary mygwas.ma --beqtl-summary myeqtl --out mysmr --thread-num 10

--bfile reads individual-level SNP genotype data (in PLINK binary format) from a reference sample for LD estimation, i.e. .bed, .bim, and .fam files.
--gwas-summary reads summary-level data from GWAS. The input format follows that for GCTA-COJO analysis ( http://cnsgenomics.com/software/gcta/cojo.html).

mygwas.ma
SNP A1 A2 freq b se p n
rs1001 A G 0.8493  0.0024  0.0055  0.6653 129850
rs1002 C G 0.03606 0.0034 0.0115 0.7659 129799
rs1003 A C 0.5128 0.045 0.038 0.2319 129830
......
Columns are SNP, the effect (coded) allele, the other allele, frequency of the effect allele, effect size, standard error, p-value and sample size. The headers are not keywords and will be omitted by the program. Important: “A1” needs to be the effect allele with “A2” being the other allele and “freq” needs to be the frequency of “A1”. NOTE:1) For a case-control study, the effect size should be log(odds ratio) with its corresponding standard error.


--beqtl-summary reads summary-level data from a eQTL study in binary format. We store eQTL summary data in three separate files .esi (SNP information, in the same format as the PLINK .bim file), .epi (probe information) and .besd (eQTL summary statistics in binary format). See Data Management for more information. We have prepared the data from the Westra study (Westra et al. 2013 Nat Genet) in this format, which is available for download at Download.

--out saves the results from the SMR analysis in .smr file (text format).

mysmr.smr
ProbeID Probe_Chr Gene Probe_bp SNP SNP_Chr SNP_bp A1 A2 Freq b_GWAS se_GWAS p_GWAS b_eQTL se_eQTL p_eQTL b_SMR se_SMR p_SMR p_HET nsnp
prb01 1 Gene1 1001 rs01 1 1011 C T 0.95 -0.024 0.0063 1.4e-04 0.36 0.048 6.4e-14 -0.0668 0.0197 6.8e-04 NA NA
prb02 1 Gene2 2001 rs02 1 2011 G C 0.0747 0.0034 0.0062 5.8e-01 0.62 0.0396 2e-55 0.0055 0.01 5.8e-01 4.17e-01 28
......
Columns are probe ID, probe chromosome, gene name, probe position, SNP name,SNP chromosome, SNP position, the effect (coded) allele, the other allele, frequency of the effect allele (estimated from the reference samples), effect size from GWAS, SE from GWAS, p-value from GWAS, effect size from eQTL study, SE from eQTL study, p-value from eQTL study, effect size from SMR, SE from SMR, p-value from SMR, p-value from HEIDI (HEterogeneity In Depedent Instruments) test, and number of SNPs used in the HEIDI test.

Missing Value is represented by "NA".

--thread-num specifies the number of OpenMP threads for parallel computing. The default value is 1.

Optional Commands

# Specify a method for HEIDI test

smr --bfile mydata --gwas-summary mygwas.ma --beqtl-summary myeqtl --heidi-mtd 0 --out mysmr

--heidi-mtd specifies a method for HEIDI test. 0 for the original HEIDI test approach as in Zhu et al. (2016 Nature Genetics), and 1 for a new HEIDI test ( beta version for testing). The default value is 1. The new approach uses up to the top 20 SNPs in the cis-eQTL region (including the top cis-eQTL) for heterogeneity test because our latest simulation shows that the power of HEIDI test increases initially but then decreases with increasing number of SNPs (m) with a peak at m = ~20.

# Filter SNPs by MAF (in the reference sample)

smr --bfile mydata --gwas-summary mygwas.ma --beqtl-summary myeqtl --maf 0.01 --out mysmr

--maf removes SNPs based on a minor allele frequency (MAF) threshold in the reference sample.

# Include or exclude a subset of individuals

smr --bfile mydata --gwas-summary mygwas.ma --beqtl-summary myeqtl --keep myindi.list --out mysmr

--keep includes a subset of individuals in the reference sample for analysis.

--remove excludes a subset of individuals in the reference sample from the analysis.

myindi.list
F001 S001
F002 S002
F003 S001
...

# Include or exclude a subset of eQTL summary data

smr --bfile mydata --gwas-summary mygwas.ma --beqtl-summary myeqtl --extract-snp mysnp.list --extract-probe myprobe.list --out mysmr

--extract-snp extracts a subset of SNPs for analysis.

--exclude-snp excludes a subset of SNPs from analysis.

mysnp.list
rs1001
rs1002
rs1003
...

--extract-probe extracts a subset of probes for analysis.

--exclude-probe excludes a subset of probes from analysis.

myprobe.list
probe1001
probe1002
probe1003
...

# Other parameters

smr --bfile mydata --gwas-summary mygwas.ma --beqtl-summary myeqtl --peqtl-smr 5e-8 --ld-pruning 0.9 --peqtl-heidi 1.57e-3 --heidi-m 3 --cis-wind 2000 --thread-num 5 --out mysmr

--peqtl-smr p-value threshold to select the top associated eQTL for the SMR test. The default value is 5.0e-8. By default, we only run the SMR analysis in the cis regions. Please see below for the SMR analysis in trans regions.

--peqtl-heidi threshold of eQTL p-value to select eQTLs for the HEIDI test. The default value is 1.57e-3, which is equivalent to a chi-squared value (df=1) of 10.

--ld-pruning LD r-squared threshold for pruning SNPs (eQTLs) in HEIDI test, removing SNPs in high LD with the top associated eQTL. The default value is 0.9.

--heidi-m minimum requirement of the number of eQTLs used in the HEIDI test. We will skip the HEIDI test if the number of SNPs is smaller than the threshold. This is because if the number of SNPs is too small, HEIDI test has little power to detect heterogeneity and possibly generates misleading result. The default value is 3.

--cis-wind defines a window centred around the probe to select cis-eQTLs (passing a p-value threshold) for the SMR analysis. The default value is 2000Kb.

# Specify a target SNP for the SMR and HEIDI tests

By default, we use the top cis-eQTL as a target in the SMR analysis, i.e. using the top cis-eQTL in the SMR test and then using the top cis-eQTL to test against the other cis-eQTLs in the region for heterogeneity in the HEIDI test. You can also specific the target by the following option. Note that this option will ignore p-value specified by the --peqtl-smr option (--peqtl-heidi still applies).

smr --bfile mydata --gwas-summary mygwas.ma --beqtl-summary myeqtl --target-snp rs12345 --out mysmr

--target-snp specifies a SNP as the target for the SMR and HEIDI tests as described above.

# Turn off the HEIDI test

smr --bfile mydata --gwas-summary mygwas.ma --beqtl-summary myeqtl --heidi-off --out mysmr

--heidi-off turns off the HEIDI test.

# SMR and HEIDI tests in trans regions

The trans-eQTLs are defined as the eQTLs that are more than 5Mb away from the probe.

smr --bfile mydata --gwas-summary mygwas.ma --beqtl-summary myeqtl --out mysmr --trans --trans-wind 1000

--trans turns on SMR and HEIDI tests in trans regions.

--trans-wind defines a window centred around the top associated trans-eQTL to select SNPs (passing a p-value threshold) for the SMR and HEIDI test. The default value is 1000Kb.

Citation

Zhihong Zhu, Futao Zhang, Han Hu, Andrew Bakshi, Matthew R. Robinson, Joseph E. Powell, Grant W. Montgomery, Michael E. Goddard, Naomi R. Wray, Peter M. Visscher and Jian Yang (2016) Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat Genet, 48: 481-487.