Mendelian Randomization Practical Exercise
Applied Research Question:
Does having higher proinflammatory CRP causally increase your blood pressure? an image caption Source: David Evans



CRP AND BLOOD PRESSURE
Graphical Representation
Q1. As you’re running the commands below, fill in the graphical representation of the IV analysis with the appropriate variables and beta-coefficients:
an image caption Source: David Evans

# Read in the dataset 
# setwd("/Users/uqdhwang/GnG_Winter_School/Module3/MR")
example <- read.table("data.txt", header=T)
attach(example)

# Look at the data 
head(example) 
##   rs3091244      CRP      SBP   INCOME      HDL
## 1         2 1.644716 118.0210 8.748793 1.358594
## 2         1 1.930808 128.6874 7.808321 1.211619
## 3         2 1.850321 100.9221 7.770443 1.395778
## 4         1 1.874545 118.0758 8.331293 1.178837
## 5         1 2.312943 124.5230 6.328416 1.303221
## 6         1 1.889228 117.8196 5.561556 1.340147
summary(example)
##    rs3091244          CRP             SBP             INCOME      
##  Min.   :0.000   Min.   :1.292   Min.   : 86.02   Min.   : 3.851  
##  1st Qu.:1.000   1st Qu.:1.863   1st Qu.:113.49   1st Qu.: 6.817  
##  Median :1.000   Median :1.997   Median :119.99   Median : 7.487  
##  Mean   :1.003   Mean   :2.000   Mean   :120.07   Mean   : 7.482  
##  3rd Qu.:2.000   3rd Qu.:2.135   3rd Qu.:126.75   3rd Qu.: 8.146  
##  Max.   :2.000   Max.   :2.909   Max.   :158.78   Max.   :11.437  
##       HDL        
##  Min.   :0.7798  
##  1st Qu.:1.1316  
##  Median :1.1981  
##  Mean   :1.1986  
##  3rd Qu.:1.2663  
##  Max.   :1.5528
# Units: SNP (0,1,2), CRP mmol/L, SBP mmHg, Income per $10,000, HDL mmol/L


Observational analyses
Q2. What does the observational linear regression of SBP on CRP show?
A.

# Run observational OLS regression (ordinary least squares) for BP & CRP
 summary(lm(SBP~CRP)) 
## 
## Call:
## lm(formula = SBP ~ CRP)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.742  -6.050   0.029   6.046  33.124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  81.7846     0.8997   90.90   <2e-16 ***
## CRP          19.1372     0.4475   42.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.063 on 9998 degrees of freedom
## Multiple R-squared:  0.1547, Adjusted R-squared:  0.1546 
## F-statistic:  1829 on 1 and 9998 DF,  p-value: < 2.2e-16
# Plot the observational association between BP and CRP
plot(CRP,SBP)
abline(lm(SBP~CRP),col="red")


Q3. What does the OLS regression of the CRP SNP rs3091244 on CRP show?
A.

# Observational OLS regression of CRP on CRP SNP
 summary(lm(CRP~rs3091244))
## 
## Call:
## lm(formula = CRP ~ rs3091244)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.70752 -0.13580 -0.00284  0.13284  0.94315 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.958318   0.003481  562.58   <2e-16 ***
## rs3091244   0.041937   0.002838   14.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2004 on 9998 degrees of freedom
## Multiple R-squared:  0.02137,    Adjusted R-squared:  0.02127 
## F-statistic: 218.3 on 1 and 9998 DF,  p-value: < 2.2e-16
# Plot the relationship between CRP and rs3091244
plot(rs3091244, CRP)
abline(lm(CRP~rs3091244),col="red")


Q4. What do the OLS regressions of potential confounders (income, HDL) show?
A.

# Confounders
 summary(lm(SBP~INCOME))
## 
## Call:
## lm(formula = SBP ~ INCOME)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -34.970  -6.564  -0.074   6.697  39.724 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 127.02281    0.74463 170.585   <2e-16 ***
## INCOME       -0.92975    0.09865  -9.425   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.814 on 9998 degrees of freedom
## Multiple R-squared:  0.008806,   Adjusted R-squared:  0.008707 
## F-statistic: 88.82 on 1 and 9998 DF,  p-value: < 2.2e-16
 summary(lm(CRP~INCOME))
## 
## Call:
## lm(formula = CRP ~ INCOME)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.65842 -0.13132 -0.00149  0.13027  0.93315 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.422891   0.014767  164.08   <2e-16 ***
## INCOME      -0.056469   0.001956  -28.86   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1946 on 9998 degrees of freedom
## Multiple R-squared:  0.07692,    Adjusted R-squared:  0.07683 
## F-statistic: 833.2 on 1 and 9998 DF,  p-value: < 2.2e-16
 summary(lm(INCOME~rs3091244))
## 
## Call:
## lm(formula = INCOME ~ rs3091244)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.6327 -0.6640  0.0042  0.6641  3.9552 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 7.480832   0.017283 432.851   <2e-16 ***
## rs3091244   0.001432   0.014091   0.102    0.919    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9949 on 9998 degrees of freedom
## Multiple R-squared:  1.033e-06,  Adjusted R-squared:  -9.899e-05 
## F-statistic: 0.01033 on 1 and 9998 DF,  p-value: 0.9191
summary(lm(SBP~HDL))
## 
## Call:
## lm(formula = SBP ~ HDL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -33.017  -6.391   0.016   6.366  36.199 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 154.1662     1.1505  133.99   <2e-16 ***
## HDL         -28.4494     0.9566  -29.74   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.448 on 9998 degrees of freedom
## Multiple R-squared:  0.08127,    Adjusted R-squared:  0.08118 
## F-statistic: 884.4 on 1 and 9998 DF,  p-value: < 2.2e-16
summary(lm(CRP~HDL))
## 
## Call:
## lm(formula = CRP ~ HDL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -0.6569 -0.1311 -0.0013  0.1296  0.8961 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   2.6829     0.0237   113.2   <2e-16 ***
## HDL          -0.5695     0.0197   -28.9   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.1946 on 9998 degrees of freedom
## Multiple R-squared:  0.07711,    Adjusted R-squared:  0.07701 
## F-statistic: 835.3 on 1 and 9998 DF,  p-value: < 2.2e-16
summary(lm(HDL~rs3091244))
## 
## Call:
## lm(formula = HDL ~ rs3091244)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.41878 -0.06671 -0.00056  0.06778  0.35160 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.196085   0.001716 697.195   <2e-16 ***
## rs3091244   0.002531   0.001399   1.809   0.0704 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.09876 on 9998 degrees of freedom
## Multiple R-squared:  0.0003273,  Adjusted R-squared:  0.0002273 
## F-statistic: 3.273 on 1 and 9998 DF,  p-value: 0.07045



*CHECK*
Add the observational-based association variables and parameters to your graphical representation.

Q5. What are the implications for these income and HDL associations for the observational CRP-SBP association?
A.

Q6. Compare the unadjusted and covariate-adjusted OLS observational regressions. What do they show?
A.

# Run a covariate-adjusted model for the association between CRP & BP
summary(lm(SBP~CRP))
## 
## Call:
## lm(formula = SBP ~ CRP)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.742  -6.050   0.029   6.046  33.124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  81.7846     0.8997   90.90   <2e-16 ***
## CRP          19.1372     0.4475   42.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.063 on 9998 degrees of freedom
## Multiple R-squared:  0.1547, Adjusted R-squared:  0.1546 
## F-statistic:  1829 on 1 and 9998 DF,  p-value: < 2.2e-16
summary(lm(SBP~CRP+INCOME+HDL))
## 
## Call:
## lm(formula = SBP ~ CRP + INCOME + HDL)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -32.320  -5.999   0.096   5.988  33.429 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 107.71279    1.87440  57.465   <2e-16 ***
## CRP          16.83413    0.47291  35.597   <2e-16 ***
## INCOME        0.20413    0.09293   2.196   0.0281 *  
## HDL         -19.06234    0.93616 -20.362   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 8.88 on 9996 degrees of freedom
## Multiple R-squared:  0.1886, Adjusted R-squared:  0.1883 
## F-statistic: 774.3 on 3 and 9996 DF,  p-value: < 2.2e-16



Q7. What could explain this?
A.



MR/IV Analyses: Wald Estimator
Obtain the required estimates to compute the causal effect using the Wald Estimator from your dataset. Note, however, that an advantage of the Wald estimator is that you do not need individual level datasets to do the MR analysis. Reported SNP effects from published GWAS is sufficient and you can take the SNP effect on the exposure from one GWAS, and the SNP effect on outcome from a different GWAS sample (“Two sample MR”)

Q8. Run the necessary OLS regressions to compute a Wald estimator

# OLS regression of CRP on CRP SNP
# OLS regression of BP on CRP SNP
 summary(lm(CRP~rs3091244))
## 
## Call:
## lm(formula = CRP ~ rs3091244)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.70752 -0.13580 -0.00284  0.13284  0.94315 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.958318   0.003481  562.58   <2e-16 ***
## rs3091244   0.041937   0.002838   14.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2004 on 9998 degrees of freedom
## Multiple R-squared:  0.02137,    Adjusted R-squared:  0.02127 
## F-statistic: 218.3 on 1 and 9998 DF,  p-value: < 2.2e-16
 summary(lm(SBP~rs3091244))
## 
## Call:
## lm(formula = SBP ~ rs3091244)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -33.947  -6.572  -0.056   6.696  38.712 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 120.1679     0.1712 701.792   <2e-16 ***
## rs3091244    -0.1014     0.1396  -0.726    0.468    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.857 on 9998 degrees of freedom
## Multiple R-squared:  5.278e-05,  Adjusted R-squared:  -4.723e-05 
## F-statistic: 0.5277 on 1 and 9998 DF,  p-value: 0.4676



Q9. From the above output, compute the causal effect using the Wald estimator, as well as it’s SE and 95% CI. What do the results show and what do they mean?
A.

Wald estimator causal Beta =
SE =
95% CI =

Q10. Rerun the observational OLS of CRP and SBP and compare with the results from the Wald estimator. What do you notice about the Beta and SEs?
A.

# Observational OLS regression
summary(lm(SBP~CRP))
## 
## Call:
## lm(formula = SBP ~ CRP)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -30.742  -6.050   0.029   6.046  33.124 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  81.7846     0.8997   90.90   <2e-16 ***
## CRP          19.1372     0.4475   42.77   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.063 on 9998 degrees of freedom
## Multiple R-squared:  0.1547, Adjusted R-squared:  0.1546 
## F-statistic:  1829 on 1 and 9998 DF,  p-value: < 2.2e-16



MR/IV Analyses: TSLS
Two-stage least squares (TSLS) MR requires individual level data, and the exposure, SNP and outcome in the one sample (“Single sample MR”).

# Call the AER library to run TSLS (if the AER package has been installed)
library(AER)
## Loading required package: car
## Loading required package: carData
## Loading required package: lmtest
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric
## Loading required package: sandwich
## Loading required package: survival
# If AER has not been installed, run the command below first:
#install.packages("AER")


# General format for TSLS command: 
# summary(ivreg(Outcome~Exposure | Instrument))


Q11. What do the TSLS results show and did it differ to the Wald estimator?
A.

# TSLS regression
summary(ivreg(SBP~CRP | rs3091244))
## 
## Call:
## ivreg(formula = SBP ~ CRP | rs3091244)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -34.4770  -6.7058  -0.1068   6.8650  39.4240 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  124.904      6.797  18.376   <2e-16 ***
## CRP           -2.418      3.398  -0.712    0.477    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.06 on 9998 degrees of freedom
## Multiple R-Squared: -0.04156,    Adjusted R-squared: -0.04166 
## Wald test: 0.5066 on 1 and 9998 DF,  p-value: 0.4766


Manual TSLS
To better understand what two-stage least squares regression is doing, let’s perform it manually.

# Regress the exposure (CRP) on the instrument (rs3091244)
 First_Stage <- lm(CRP~rs3091244)


Create a new variable (Pred_CRP), which is the predicted values of the exposure (CRP) from the first-stage regression with the instrument. You can think of this as the expected value of exposure (CRP) given we know the particular individual’s genotype (for CRP SNP rs3091244).

# Create predicted CRP values, from first-stage regression
Pred_CRP <- predict(First_Stage)

#Have a quick look at these values
table(Pred_CRP)
## Pred_CRP
##  1.9583177725362 2.00025487885585  2.0421919851755 
##             2478             5015             2507
plot(rs3091244, CRP)
abline(lm(CRP~ rs3091244), col="red")

#Now regress the outcome variable (BP) on the predicted values of CRP, from the first-stage regression.

# Second stage regression
Second_Stage <- lm(SBP~Pred_CRP)

# Look at the results:
 summary(Second_Stage)
## 
## Call:
## lm(formula = SBP ~ Pred_CRP)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -33.947  -6.572  -0.056   6.696  38.712 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  124.904      6.660  18.754   <2e-16 ***
## Pred_CRP      -2.418      3.329  -0.726    0.468    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 9.857 on 9998 degrees of freedom
## Multiple R-squared:  5.278e-05,  Adjusted R-squared:  -4.723e-05 
## F-statistic: 0.5277 on 1 and 9998 DF,  p-value: 0.4676


Q12. Are they the same as ‘ivreg’ TSLS function?
A.
*CHECK*
Are all the variables and parameters now complete in your graphical representation?

Weak instruments bias
Assessing instrument strength with the F-stat (looking for >10).
For Single SNP MR, the F-statistic is calculated as:
an image caption Source: David Evans


where R2 is the variance explained in exposure by the SNP, and N is number of individuals in the study. This statistic is available in the output for OLS and TSLS.

Q14. Looking at the F-statistic, determine if weak instruments may be an issue
A.

#Look at F-stat from the first-stage linear regression
 summary(lm(CRP~rs3091244))
## 
## Call:
## lm(formula = CRP ~ rs3091244)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.70752 -0.13580 -0.00284  0.13284  0.94315 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1.958318   0.003481  562.58   <2e-16 ***
## rs3091244   0.041937   0.002838   14.78   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2004 on 9998 degrees of freedom
## Multiple R-squared:  0.02137,    Adjusted R-squared:  0.02127 
## F-statistic: 218.3 on 1 and 9998 DF,  p-value: < 2.2e-16
#Look at F-stat from ‘diagnostics’ by AER package
summary(ivreg(SBP~CRP | rs3091244), diagnostics=T)
## 
## Call:
## ivreg(formula = SBP ~ CRP | rs3091244)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -34.4770  -6.7058  -0.1068   6.8650  39.4240 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  124.904      6.797  18.376   <2e-16 ***
## CRP           -2.418      3.398  -0.712    0.477    
## 
## Diagnostic tests:
##                   df1  df2 statistic  p-value    
## Weak instruments    1 9998    218.34  < 2e-16 ***
## Wu-Hausman          1 9997     50.93 1.02e-12 ***
## Sargan              0   NA        NA       NA    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 10.06 on 9998 degrees of freedom
## Multiple R-Squared: -0.04156,    Adjusted R-squared: -0.04166 
## Wald test: 0.5066 on 1 and 9998 DF,  p-value: 0.4766


Discuss:
Q14. How would having weak instruments change the causal estimate of CRP on SBP, in this study (single sample)?