The following are links to download summary data for papers written within the centre, listed in reverse chronological order.
Sidorenko and Kassam et al. 2019,
Title: The effect of X-linked dosage compensation on complex trait variation
- UKBv3_Xchr_20traits.tar.gz: Summary statistics of X-chromosome association studies in the UK Biobank for 20 complex traits.
- chrX_eqtl_besd.tar.gz: X-chromosome eQTL summary statistics from the CAGE consortium and GTEx in SMR format.
Wu et al. 2019, Nature Communications
Title: Genome-wide association study of medication-use and associated disease in the UK Biobank
- 23_medication-taking_GWAS_summary_statistics: Summary statistics of genome-wide association studies for 23 medication-taking traits.
- 23_medication-taking_GWAS_summary_statistics_README.pdf: Description of the dataset.
Yap et al. 2018, Nature Communications
Title: Dissection of genetic variation and evidence for pleiotropy in male pattern baldness
- mpb_bolt_lmm_aut_x.tab.zip: Summary statistics for the male pattern baldness (MPB) genome-wide association study.
- MPB_GWAS_summary_statistics_README.pdf: Description of the dataset.
Xue et al. 2018, Nature Communications
Title: Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes
- Xue_et_al_T2D_META_Nat_Commun_2018.gz: Summary statistics of genome-wide association of type 2 diabetes.
- Xue_et_al_T2D_META_Nat_Commun_2018.pdf: Description of the dataset.
Yengo et al. 2018, Human Molecular Genetics
Title: Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of European ancestry
- Meta-analysis_Locke_et_al+UKBiobank_2018.txt.gz: Summary statistics of genome-wide association of body mass index. Columns are chromosome (CHR), genomic position (POS) on the human genome 37 (hg19), SNP rs identifier (SNP), Allele tested for association (Tested_Allele), Alternative allele (Other_Allele), Frequency of the tested allele in the Health and Retirement Study (Freq_Tested_Allele_in_HRS), estimated effect of the tested allele (BETA), estimated standard error of the effect size of the tested allele (SE), Association p-value (P) and sample size (N).
- Meta-analysis_Wood_et_al+UKBiobank_2018.txt.gz: Summary statistics of genome-wide association of height. Columns are chromosome (CHR), genomic position (POS) on the human genome 37 (hg19), SNP rs identifier (SNP), Allele tested for association (Tested_Allele), Alternative allele (Other_Allele), Frequency of the tested allele in the Health and Retirement Study (Freq_Tested_Allele_in_HRS), estimated effect of the tested allele (BETA), estimated standard error of the effect size of the tested allele (SE), Association p-value (P) and sample size (N).
- 610_genes_prioritised_with_SMR_for_height_Yengo_et_al_HMG_2018.txt and 110_genes_prioritised_with_SMR_for_BMI_Yengo_et_al_HMG_2018.txt: Results from Summary-data based Mendelian Randomisation (SMR) aiming at prioritising genes, which local genetic control correlate with that of focus traits in genome-wide association studies of height and Body Mass Index (BMI; Yengo et al. 2018; HMG). These analyses have prioritised 110 and 610 genes associated with BMI and height respectively. Each file contains tables with 21 columns as described described here (http://cnsgenomics.com/software/smr/#SMR&HEIDIanalysis) and last column (“tissue”) indicates in which tissue gene expression was measured. SMR analyses were performed using expression QTL (eQTL) data from the Gtex project.
Zhu et al. 2018, Nature Communications
Title: Causal associations between risk factors and common diseases inferred from GWAS summary data
The summary-level GWAS data for 23 phenotypes were from GERA and UK Biobank. Each data set has been made available as a whitespace-separate table in GCTA-COJO format. Columns are SNP, the effect allele, the other allele, frequency of the effect allele, effect size, standard error, p-value and sample size.
- GERA data: Details of quality controls of the genotyped and imputed data can be found in Zhu et al. (2018 Nat. Commun.). The individual-level ICD-9 codes were classified into 22 common diseases. We added an additional trait ‘Disease Count’ (a count of the number of diseases affecting each individual) as a crude measure of general health status of each individual.
- UK Biobank data: Details of quality controls of the genotyped and imputed data can be found in Zhu et al. (2018 Nat. Commun.). Individual-level ICD-10 codes were available in the UKB data. To match the diseases in GERA, we classified the phenotypes into 22 common diseases by projecting the ICD-10 codes to the classifications of ICD-9 codes in GERA taking into account the self-reported disease status. Note that we did not perform the association analysis for dermatophytosis because the number of cases was too small. We only performed the association analyses on a subset of SNPs (in common with the top associated SNPs for the risk factors) for insomnia, iron deficiency anemias, macular degeneration, peripheral vascular disease and acute reaction to stress.
Marioni et al. 2018, Translational Psychiatry
Title: GWAS on family history of Alzheimer’s disease
- Summary statistics: Summary statistics of a genome-wide association of Alzheimer’s Disease (AD), combining published GWAS results for AD with a GWAS on family history in the UK Biobank. [link for file]: Columns are rs identifier/chr:position_allele1_allele2 (SNP), allele tested for association (A1), alternative allele (A2), estimated effect of A1 (BETA: log-odds), estimated standard error of BETA (SE), association p-value (P), chromosome (CHR), position on the chromosome (BP) and sample size (N). Note that this file has been corrected in March 2019
Robinson et al. 2015, Nature Genetics
Title: Population genetic differentiation of height and body mass index across Europe
within-family genome-wide association statistics from the sibling pair data generated using the QFAM approach in PLINK. Estimates are for HapMap3 SNPs, which passed QC thresholds of MAF>1%, HWE < 1x10-6, Imputation info score > 0.4. Columns are chromosome, SNP rs identity, the coded allele, the coded allele frequency, effect size, approximate standard error, and permutation p-value from 100,000 permutations.
Corrected height data and Corrected BMI data: These files are a replacement for the file with summary statistics that were publicly released in March 2016. README
The reason for the replacement is that it was recently reported that there were issues with the publicly released within-family SNP effect estimates from the Robinson et al. study. These summary data were publicly released in response to a request from J. Pritchard and colleagues for a study that was published later that year (Field et al., Science Vol. 354, Issue 6313, pp. 760-764, 11 Nov 2016). The request from the Field et al. study authors was for statistics in addition to those produced for use in the Robinson et al. study. We therefore re-ran the within-family analysis using PLINK v1.90b3c 64-bit, whereas the actual within-family estimates used in the published Robinson et al. paper from 2015 was produced using Plink v1.07.
We have now found that Plink software version v1.90b3c contained a bug which can result in a mishandling of family structure and likely no control for population structure within the analysis (the current software version reports this error as now corrected). Results from Plink v1.07 used in Robinson et al. 2015 are replicated when using updated estimates from the latest corrected PLINK versions and when using an independent in-house implementation of sib-regression.
We apologise for the inconvenience caused.
Yang et al. 2015, Nature Genetics
Title: Estimation of genetic variance from imputed sequence variants reveals negligible missing heritability for human height and body mass index
- LDSCORE_release_July2015.tar.gz: per-SNP and per-segment LD scores calculated from 44,126 unrelated indivduals and ~17M imputed variants. Columns are SNP, per-SNP LD score, and per-segment LD score.
- GWAS_summary_release_July2015.tar.gz: GWAS summary data. Columns are SNP, the coded allele, effect size, and standard error.
Lloyd-Jones et al. 2017, American Journal of Human Genetics
Title: The Genetic Architecture of Gene Expression in Peripheral Blood