Data

The following are links to download summary data for papers written within the centre, listed in reverse chronological order.

Robinson et al. 2015, Nature Genetics

Title: Population genetic differentiation of height and body mass index across Europe

  1. withinfam_summary_ht_bmi_release_March2016.tar.gz: within-family genome-wide association statistics from the sibling pair data generated using the QFAM approach in PLINK. Estimates are for HapMap3 SNPs, which passed QC thresholds of MAF>1%, HWE < 1x10-6, Imputation info score > 0.4. Columns are chromosome, SNP rs identity, the coded allele, the coded allele frequency, effect size, approximate standard error, and permutation p-value from 100,000 permutations.

Yang et al. 2015, Nature Genetics

Title: Estimation of genetic variance from imputed sequence variants reveals negligible missing heritability for human height and body mass index

  1. LDSCORE_release_July2015.tar.gz: per-SNP and per-segment LD scores calculated from 44,126 unrelated indivduals and ~17M imputed variants. Columns are SNP, per-SNP LD score, and per-segment LD score.
  2. GWAS_summary_release_July2015.tar.gz: GWAS summary data. Columns are SNP, the coded allele, effect size, and standard error.

Lloyd-Jones et al. 2017, American Journal of Human Genetics

Title: The Genetic Architecture of Gene Expression in Peripheral Blood

  1. See online web app to download a full set of summary statistics or download the Summary Mendelian Randomization binary format at SMR.

Zhu et al. 2017, bioRxiv

Title: Causal associations between risk factors and common diseases inferred from GWAS summary data

The summary-level GWAS data for 23 phenotypes were from GERA and UK Biobank. Each data set has been made available as a whitespace-separate table in GCTA-COJO format. Columns are SNP, effect allele, the other allele, frequency of the effect allele, effect size, standard error, p-value and sample size.

  1. GERA data Zhu_et_al_GSMR_2017_GERA.tar.gz: Details of quality controls of the genotyped and imputed data can be found in Zhu et al. (2017 bioRxiv). The individual-level ICD-9 codes were classified into 22 common diseases. We added an additional trait ‘Disease Count’ (a count of the number of diseases affecting each individual) as a crude measure of general health status of each individual. We performed a genome-wide association analysis for each of the 23 phenotypes with age, gender and the first 20 PCs fitted as covariates.

  2. UK Biobank data Zhu_et_al_GSMR_2017_UKB.tar.gz: Details of quality controls of the genotyped and imputed data can be found in Zhu et al. (2017 bioRxiv). Individual-level ICD-10 codes were available in the UKB data. To match the diseases in GERA, we classified the phenotypes into 22 common diseases by projecting the ICD-10 codes to the classifications of ICD-9 codes in GERA taking into account the self-reported disease status. We also added the trait ‘Disease Count’. We conducted genome-wide association analyses for the 23 phenotypes using the same approach as above. Note that we did not perform the association analysis for dermatophytosis because the number of cases was too small (an issue with PLINK2). We only performed the association analyses on a subset of SNPs (SNPs in HapMap2 and those in common with the top associated SNPs for the risk factors) for insomnia, iron deficiency anemias, macular degeneration, peripheral vascular disease and acute reaction to stress.