UQ Winter School 2024

Joana Revez & Loïc Thibaut


Here we will meta-analyse results from a genome-wide association study (GWAS) of height by Wood et al. 2014 Nat Genet and a GWAS of height by Jiang et al. 2019 Nat Genet.



Download GWAS summary statistics

  1. Wood et al. 2014 Nat Genet
    • Description: This is a GWAS based on 253,288 individuals. It is, in itself, a meta-analysis of height GWAS across different cohorts.
    • Summary statistics available in: GIANT webpage (file named Download Height GZIP).
  2. Jiang et al. 2019 Nat Genet
    • Description: This is a GWAS of height based on data from 456,422 participants from the UK Biobank (UKB).
    • Summary statistics available in: fastGWA data portal (look up “Standing height” and download sumstats by clicking on the button “Download summary statistics” on the top right corner).


To download sumstats directly to the cluster, copy the link to the file you want to download and use wget -O <file_name.txt> <URL>.

NOTE: This step has already been run for you, so that you don’t have to wait for the files to be downloaded.

# ****************************
# Download summary statistics
# ****************************
# Go to directory where files will be saved
cd /data/module1/7_metaPrac

## Wood et al
url=http://portals.broadinstitute.org/collaboration/giant/images/0/01/GIANT_HEIGHT_Wood_et_al_2014_publicrelease_HapMapCeuFreq.txt.gz
wget -O Height_Wood2014.gz $url

## Jiang et al
url=https://yanglab.westlake.edu.cn/data/fastgwa_data/UKB/50.v1.1.fastGWA.gz
wget -O Height_Jiang2019.gz $url



Visualise sumstats

Use zcat to see the first lines of the file and assess the information provided.

# Wood et al. 2014
zcat /data/module1/7_metaPrac/Height_Wood2014.gz | head
## MarkerName   Allele1 Allele2 Freq.Allele1.HapMapCEU  b   SE  p   N
## rs4747841    A   G   0.551   -0.0011 0.0029  0.70    253213
## rs4749917    T   C   0.436   0.0011  0.0029  0.70    253213
## rs737656 A   G   0.367   -0.0062 0.0030  0.042   253116
## rs737657 A   G   0.358   -0.0062 0.0030  0.041   252156
## rs7086391    T   C   0.12    -0.0087 0.0038  0.024   248425
## rs878177 T   C   0.3 0.014   0.0031  8.2e-06 251271
## rs878178 A   T   0.644   0.0067  0.0031  0.029   253086
## rs12219605   T   G   0.427   0.0011  0.0029  0.70    253213
## rs3763688    C   G   0.144   -0.0022 0.0045  0.62    253056

Columns are: MarkerName, SNP rsID; Allele1, effect allelle,; Allele2, other allele; Freq.Allele1.HapMapCEU, Allele 1 frequency in EUR individuals from HapMap; b, SE, and p, effect size, standard error and P-value of the effect of Allele 1; N, sample size.


# Jiang et al. 2019
zcat /data/module1/7_metaPrac/Height_Jiang2019.gz | head
## CHR  SNP POS A1  A2  N   AF1 BETA    SE  P
## 1    rs144155419 717587  G   A   445877  0.989495    0.00183444  0.00722402  0.799545
## 1    rs58276399  731718  T   C   435350  0.888746    -2.38349e-05    0.00237193  0.991982
## 1    rs141242758 734349  T   C   436593  0.888847    -9.42285e-05    0.00236962  0.96828
## 1    rs28544273  751343  T   A   447512  0.878876    0.000566575 0.00225319  0.801463
## 1    rs28527770  751756  T   C   447816  0.878591    0.000583722 0.00225047  0.795343
## 1    rs3115860   753405  C   A   452415  0.127989    -0.00104942 0.00218969  0.631757
## 1    rs3131970   753425  T   C   452802  0.124182    -0.000304986    0.00221626  0.890546
## 1    rs2073813   753541  G   A   451742  0.872927    0.000910638 0.00219765  0.678604
## 1    rs3131969   754182  A   G   452129  0.128369    -0.00110491 0.00218736  0.613465

Columns are: CHR, Chromosome; SNP, SNP rsID; POS, SNP base pair position; A1, effect allelle,; A2, other allele; ; N, sample size; AF1, A1 frequency in GWAS sample (EUR individuals from UKB); BETA, SE, and P, effect size, standard error and P-value of the effect of A1.

NOTE: In some datasets the effects may have been estimated for the alternative allele. It is very important to read through the documentation associated with the summary statistics to understand what is the effect allele.



Format summary statistics

We will use PLINK to meta-analyse the Wood et al. and Jiang et al. GWAS. As outlined in PLINK’s website, the program requires the effect size estimates to be in a column named “BETA”. As we saw above, the Wood et al. GWAS has betas in a column called “b”. Here, we will change the name of that column to “BETA”.

We also saw above that the Wood et al. summary statistics do not include information about chromosome (CHR) and SNP base pair position (BP). Although PLINK does not require these fields to meta-analyse results, it expects those columns as input. Here, we will add two dummy columns to the Wood et al. sumstats, representing the CHR and BP.

We will run these steps in the command line with UNIX code. Alternatively, you could open the file in R, make these changes and save, but it would take longer.

# Go to your home directory
cd ~

# Change column name
zcat /data/module1/7_metaPrac/Height_Wood2014.gz |
    sed -e '1s/b/BETA/' |
    awk '{if (NR==1) {print $0,"CHR","BP"} else {print $0,1,1}}' | 
    gzip > Height_Wood2014_edited.gz     

## The 1st line unzips the original file. The pipe at the end of the line ("|") makes turns the output of that line into the input of the next command
## The 2nd line changes the name of the column containing betas from "b" to "BETA"
## The 3rd line adds two extra dummy columns where the first line is "CHR" and "BP"
## The 4th line zips the output and saves it into a file named "Height_Wood2014_edited.gz"

# View edited file
zcat Height_Wood2014_edited.gz | head

# Note, we already have a copy of this in /data/module1/7_metaPrac/
## MarkerName   Allele1 Allele2 Freq.Allele1.HapMapCEU  BETA    SE  p   N CHR BP
## rs4747841    A   G   0.551   -0.0011 0.0029  0.70    253213 1 1
## rs4749917    T   C   0.436   0.0011  0.0029  0.70    253213 1 1
## rs737656 A   G   0.367   -0.0062 0.0030  0.042   253116 1 1
## rs737657 A   G   0.358   -0.0062 0.0030  0.041   252156 1 1
## rs7086391    T   C   0.12    -0.0087 0.0038  0.024   248425 1 1
## rs878177 T   C   0.3 0.014   0.0031  8.2e-06 251271 1 1
## rs878178 A   T   0.644   0.0067  0.0031  0.029   253086 1 1
## rs12219605   T   G   0.427   0.0011  0.0029  0.70    253213 1 1
## rs3763688    C   G   0.144   -0.0022 0.0045  0.62    253056 1 1



Meta-analysis example in R

Now, as a last exercise, we will meta-analyse one SNP (rs10000940) in R to understand how PLINK works. We will use the formulas presented in Table 1 of Willer et al. 2010 for an inverse variance-based meta-analysis:


  • \(\beta_{i}\) : effect estimate for study i
  • \(se_{i}\) : standard error for study i
  • \(w_{i}\) = \(\frac{1}{se_{i}^2}\)
  • \(se = \sqrt{\frac{1} {\sum_{i} w_i}}\)
  • \(\beta = \frac{\sum_{i} \beta_{i} w_i}{\sum_{i} w_i}\)
  • \(Z = \frac{\beta}{se}\)


snp <- "rs10000940"

# Get rs10000940 summary statistics from Wood et al.
rs10000940_wood <- wood2014 %>% filter(MarkerName == snp)

b_wood <- rs10000940_wood$b
se_wood <- rs10000940_wood$SE

# Get rs10000940 summary statistics from Jiang et al.
rs10000940_jiang <- jiang2019 %>% filter(SNP == snp)

b_jiang <- rs10000940_jiang$BETA
se_jiang <- rs10000940_jiang$SE

# Compute weights for each study: the inverse of the variance
w_wood <- 1 / se_wood^2
w_jiang <- 1 / se_jiang^2

# Calculate the SE
se_meta <- sqrt(1 / (w_wood + w_jiang))

# Calculate the effect size BETA
b_meta <- (b_wood * w_wood + b_jiang * w_jiang) / (w_wood + w_jiang)

# Calculate the Z-score
z <- b_meta / se_meta

# Calculate the p-value
p_meta  <- 2 * pnorm(z)

# Compare with plink results
filter(meta, SNP == snp) %>% select(BETA, P)
## # A tibble: 1 × 2
##      BETA            P
##     <dbl>        <dbl>
## 1 -0.0074 0.0000000166
b_meta 
## [1] -0.007358003
p_meta
## [1] 1.654751e-08