Mendelian Randomization Practical
Exercise
Applied Research Question:
Does having higher proinflammatory CRP causally increase your blood
pressure?
CRP
AND BLOOD PRESSURE
Graphical Representation
Q1. As you’re running the commands below, fill in the graphical
representation of the IV analysis with the appropriate variables and
beta-coefficients:
# Read in the dataset
# setwd("/Users/uqdhwang/GnG_Winter_School/Module3/MR")
example <- read.table("data.txt", header=T)
attach(example)
# Look at the data
head(example)
## rs3091244 CRP SBP INCOME HDL
## 1 2 1.644716 118.0210 8.748793 1.358594
## 2 1 1.930808 128.6874 7.808321 1.211619
## 3 2 1.850321 100.9221 7.770443 1.395778
## 4 1 1.874545 118.0758 8.331293 1.178837
## 5 1 2.312943 124.5230 6.328416 1.303221
## 6 1 1.889228 117.8196 5.561556 1.340147
summary(example)
## rs3091244 CRP SBP INCOME
## Min. :0.000 Min. :1.292 Min. : 86.02 Min. : 3.851
## 1st Qu.:1.000 1st Qu.:1.863 1st Qu.:113.49 1st Qu.: 6.817
## Median :1.000 Median :1.997 Median :119.99 Median : 7.487
## Mean :1.003 Mean :2.000 Mean :120.07 Mean : 7.482
## 3rd Qu.:2.000 3rd Qu.:2.135 3rd Qu.:126.75 3rd Qu.: 8.146
## Max. :2.000 Max. :2.909 Max. :158.78 Max. :11.437
## HDL
## Min. :0.7798
## 1st Qu.:1.1316
## Median :1.1981
## Mean :1.1986
## 3rd Qu.:1.2663
## Max. :1.5528
# Units: SNP (0,1,2), CRP mmol/L, SBP mmHg, Income per $10,000, HDL mmol/L
Observational analyses
Q2. What does
the observational linear regression of SBP on CRP show?
A.
# Run observational OLS regression (ordinary least squares) for BP & CRP
summary(lm(SBP~CRP))
##
## Call:
## lm(formula = SBP ~ CRP)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.742 -6.050 0.029 6.046 33.124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 81.7846 0.8997 90.90 <2e-16 ***
## CRP 19.1372 0.4475 42.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.063 on 9998 degrees of freedom
## Multiple R-squared: 0.1547, Adjusted R-squared: 0.1546
## F-statistic: 1829 on 1 and 9998 DF, p-value: < 2.2e-16
# Plot the observational association between BP and CRP
plot(CRP,SBP)
abline(lm(SBP~CRP),col="red")
Q3. What does the OLS regression of the CRP SNP rs3091244 on
CRP show?
A.
# Observational OLS regression of CRP on CRP SNP
summary(lm(CRP~rs3091244))
##
## Call:
## lm(formula = CRP ~ rs3091244)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.70752 -0.13580 -0.00284 0.13284 0.94315
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.958318 0.003481 562.58 <2e-16 ***
## rs3091244 0.041937 0.002838 14.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2004 on 9998 degrees of freedom
## Multiple R-squared: 0.02137, Adjusted R-squared: 0.02127
## F-statistic: 218.3 on 1 and 9998 DF, p-value: < 2.2e-16
# Plot the relationship between CRP and rs3091244
plot(rs3091244, CRP)
abline(lm(CRP~rs3091244),col="red")
Q4. What do the OLS regressions of potential confounders
(income, HDL) show?
A.
# Confounders
summary(lm(SBP~INCOME))
##
## Call:
## lm(formula = SBP ~ INCOME)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.970 -6.564 -0.074 6.697 39.724
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 127.02281 0.74463 170.585 <2e-16 ***
## INCOME -0.92975 0.09865 -9.425 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.814 on 9998 degrees of freedom
## Multiple R-squared: 0.008806, Adjusted R-squared: 0.008707
## F-statistic: 88.82 on 1 and 9998 DF, p-value: < 2.2e-16
summary(lm(CRP~INCOME))
##
## Call:
## lm(formula = CRP ~ INCOME)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.65842 -0.13132 -0.00149 0.13027 0.93315
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.422891 0.014767 164.08 <2e-16 ***
## INCOME -0.056469 0.001956 -28.86 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1946 on 9998 degrees of freedom
## Multiple R-squared: 0.07692, Adjusted R-squared: 0.07683
## F-statistic: 833.2 on 1 and 9998 DF, p-value: < 2.2e-16
summary(lm(INCOME~rs3091244))
##
## Call:
## lm(formula = INCOME ~ rs3091244)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.6327 -0.6640 0.0042 0.6641 3.9552
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.480832 0.017283 432.851 <2e-16 ***
## rs3091244 0.001432 0.014091 0.102 0.919
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9949 on 9998 degrees of freedom
## Multiple R-squared: 1.033e-06, Adjusted R-squared: -9.899e-05
## F-statistic: 0.01033 on 1 and 9998 DF, p-value: 0.9191
summary(lm(SBP~HDL))
##
## Call:
## lm(formula = SBP ~ HDL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.017 -6.391 0.016 6.366 36.199
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 154.1662 1.1505 133.99 <2e-16 ***
## HDL -28.4494 0.9566 -29.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.448 on 9998 degrees of freedom
## Multiple R-squared: 0.08127, Adjusted R-squared: 0.08118
## F-statistic: 884.4 on 1 and 9998 DF, p-value: < 2.2e-16
summary(lm(CRP~HDL))
##
## Call:
## lm(formula = CRP ~ HDL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.6569 -0.1311 -0.0013 0.1296 0.8961
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.6829 0.0237 113.2 <2e-16 ***
## HDL -0.5695 0.0197 -28.9 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.1946 on 9998 degrees of freedom
## Multiple R-squared: 0.07711, Adjusted R-squared: 0.07701
## F-statistic: 835.3 on 1 and 9998 DF, p-value: < 2.2e-16
summary(lm(HDL~rs3091244))
##
## Call:
## lm(formula = HDL ~ rs3091244)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.41878 -0.06671 -0.00056 0.06778 0.35160
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.196085 0.001716 697.195 <2e-16 ***
## rs3091244 0.002531 0.001399 1.809 0.0704 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.09876 on 9998 degrees of freedom
## Multiple R-squared: 0.0003273, Adjusted R-squared: 0.0002273
## F-statistic: 3.273 on 1 and 9998 DF, p-value: 0.07045
*CHECK*
Add the observational-based association
variables and parameters to your graphical representation.
Q5. What are the implications for these income and HDL
associations for the observational CRP-SBP association?
A.
Q6. Compare the unadjusted and covariate-adjusted OLS
observational regressions. What do they show?
A.
# Run a covariate-adjusted model for the association between CRP & BP
summary(lm(SBP~CRP))
##
## Call:
## lm(formula = SBP ~ CRP)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.742 -6.050 0.029 6.046 33.124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 81.7846 0.8997 90.90 <2e-16 ***
## CRP 19.1372 0.4475 42.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.063 on 9998 degrees of freedom
## Multiple R-squared: 0.1547, Adjusted R-squared: 0.1546
## F-statistic: 1829 on 1 and 9998 DF, p-value: < 2.2e-16
summary(lm(SBP~CRP+INCOME+HDL))
##
## Call:
## lm(formula = SBP ~ CRP + INCOME + HDL)
##
## Residuals:
## Min 1Q Median 3Q Max
## -32.320 -5.999 0.096 5.988 33.429
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 107.71279 1.87440 57.465 <2e-16 ***
## CRP 16.83413 0.47291 35.597 <2e-16 ***
## INCOME 0.20413 0.09293 2.196 0.0281 *
## HDL -19.06234 0.93616 -20.362 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 8.88 on 9996 degrees of freedom
## Multiple R-squared: 0.1886, Adjusted R-squared: 0.1883
## F-statistic: 774.3 on 3 and 9996 DF, p-value: < 2.2e-16
Q7. What could explain this?
A.
MR/IV Analyses: Wald Estimator
Obtain
the required estimates to compute the causal effect using the Wald
Estimator from your dataset. Note, however, that an advantage of the
Wald estimator is that you do not need individual level datasets to do
the MR analysis. Reported SNP effects from published GWAS is sufficient
and you can take the SNP effect on the exposure from one GWAS, and the
SNP effect on outcome from a different GWAS sample (“Two sample MR”)
Q8. Run the necessary OLS regressions to compute a Wald
estimator
# OLS regression of CRP on CRP SNP
# OLS regression of BP on CRP SNP
summary(lm(CRP~rs3091244))
##
## Call:
## lm(formula = CRP ~ rs3091244)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.70752 -0.13580 -0.00284 0.13284 0.94315
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.958318 0.003481 562.58 <2e-16 ***
## rs3091244 0.041937 0.002838 14.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2004 on 9998 degrees of freedom
## Multiple R-squared: 0.02137, Adjusted R-squared: 0.02127
## F-statistic: 218.3 on 1 and 9998 DF, p-value: < 2.2e-16
summary(lm(SBP~rs3091244))
##
## Call:
## lm(formula = SBP ~ rs3091244)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.947 -6.572 -0.056 6.696 38.712
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 120.1679 0.1712 701.792 <2e-16 ***
## rs3091244 -0.1014 0.1396 -0.726 0.468
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.857 on 9998 degrees of freedom
## Multiple R-squared: 5.278e-05, Adjusted R-squared: -4.723e-05
## F-statistic: 0.5277 on 1 and 9998 DF, p-value: 0.4676
Q9. From the above output, compute the causal effect
using the Wald estimator, as well as it’s SE and 95% CI. What do the
results show and what do they mean?
A.
Wald
estimator causal Beta =
SE =
95% CI =
Q10.
Rerun the observational OLS of CRP and SBP and compare with the results
from the Wald estimator. What do you notice about the Beta and SEs?
A.
# Observational OLS regression
summary(lm(SBP~CRP))
##
## Call:
## lm(formula = SBP ~ CRP)
##
## Residuals:
## Min 1Q Median 3Q Max
## -30.742 -6.050 0.029 6.046 33.124
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 81.7846 0.8997 90.90 <2e-16 ***
## CRP 19.1372 0.4475 42.77 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.063 on 9998 degrees of freedom
## Multiple R-squared: 0.1547, Adjusted R-squared: 0.1546
## F-statistic: 1829 on 1 and 9998 DF, p-value: < 2.2e-16
MR/IV Analyses: TSLS
Two-stage least
squares (TSLS) MR requires individual level data, and the exposure, SNP
and outcome in the one sample (“Single sample MR”).
# Call the AER library to run TSLS (if the AER package has been installed)
library(AER)
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
# If AER has not been installed, run the command below first:
#install.packages("AER")
# General format for TSLS command:
# summary(ivreg(Outcome~Exposure | Instrument))
Q11. What do the TSLS results show and did it differ to
the Wald estimator?
A.
# TSLS regression
summary(ivreg(SBP~CRP | rs3091244))
##
## Call:
## ivreg(formula = SBP ~ CRP | rs3091244)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.4770 -6.7058 -0.1068 6.8650 39.4240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 124.904 6.797 18.376 <2e-16 ***
## CRP -2.418 3.398 -0.712 0.477
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.06 on 9998 degrees of freedom
## Multiple R-Squared: -0.04156, Adjusted R-squared: -0.04166
## Wald test: 0.5066 on 1 and 9998 DF, p-value: 0.4766
Manual TSLS
To better understand what two-stage least
squares regression is doing, let’s perform it manually.
# Regress the exposure (CRP) on the instrument (rs3091244)
First_Stage <- lm(CRP~rs3091244)
Create a new variable (Pred_CRP), which is the predicted values
of the exposure (CRP) from the first-stage regression with the
instrument. You can think of this as the expected value of exposure
(CRP) given we know the particular individual’s genotype (for CRP SNP
rs3091244).
# Create predicted CRP values, from first-stage regression
Pred_CRP <- predict(First_Stage)
#Have a quick look at these values
table(Pred_CRP)
## Pred_CRP
## 1.9583177725362 2.00025487885585 2.0421919851755
## 2478 5015 2507
plot(rs3091244, CRP)
abline(lm(CRP~ rs3091244), col="red")
#Now regress the outcome variable (BP) on the predicted values of CRP, from the first-stage regression.
# Second stage regression
Second_Stage <- lm(SBP~Pred_CRP)
# Look at the results:
summary(Second_Stage)
##
## Call:
## lm(formula = SBP ~ Pred_CRP)
##
## Residuals:
## Min 1Q Median 3Q Max
## -33.947 -6.572 -0.056 6.696 38.712
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 124.904 6.660 18.754 <2e-16 ***
## Pred_CRP -2.418 3.329 -0.726 0.468
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 9.857 on 9998 degrees of freedom
## Multiple R-squared: 5.278e-05, Adjusted R-squared: -4.723e-05
## F-statistic: 0.5277 on 1 and 9998 DF, p-value: 0.4676
Q12. Are they the same as ‘ivreg’ TSLS function?
A.
*CHECK*
Are all the variables and parameters
now complete in your graphical representation?
Weak
instruments bias
Assessing instrument strength with the
F-stat (looking for >10).
For Single SNP MR, the F-statistic is
calculated as:
where R2 is the variance explained in exposure
by the SNP, and N is number of individuals in the study. This
statistic is available in the output for OLS and TSLS.
Q14. Looking at the F-statistic, determine if weak instruments
may be an issue
A.
#Look at F-stat from the first-stage linear regression
summary(lm(CRP~rs3091244))
##
## Call:
## lm(formula = CRP ~ rs3091244)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.70752 -0.13580 -0.00284 0.13284 0.94315
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.958318 0.003481 562.58 <2e-16 ***
## rs3091244 0.041937 0.002838 14.78 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2004 on 9998 degrees of freedom
## Multiple R-squared: 0.02137, Adjusted R-squared: 0.02127
## F-statistic: 218.3 on 1 and 9998 DF, p-value: < 2.2e-16
#Look at F-stat from ‘diagnostics’ by AER package
summary(ivreg(SBP~CRP | rs3091244), diagnostics=T)
##
## Call:
## ivreg(formula = SBP ~ CRP | rs3091244)
##
## Residuals:
## Min 1Q Median 3Q Max
## -34.4770 -6.7058 -0.1068 6.8650 39.4240
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 124.904 6.797 18.376 <2e-16 ***
## CRP -2.418 3.398 -0.712 0.477
##
## Diagnostic tests:
## df1 df2 statistic p-value
## Weak instruments 1 9998 218.34 < 2e-16 ***
## Wu-Hausman 1 9997 50.93 1.02e-12 ***
## Sargan 0 NA NA NA
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 10.06 on 9998 degrees of freedom
## Multiple R-Squared: -0.04156, Adjusted R-squared: -0.04166
## Wald test: 0.5066 on 1 and 9998 DF, p-value: 0.4766
Discuss:
Q14. How would having weak instruments
change the causal estimate of CRP on SBP, in this study (single
sample)?