The following are links to download summary data for papers written within the centre, listed in reverse chronological order.
Niarchou et al. 2020 Translational Psychiatry
Title: Genome-wide association study of dietary intake in the UK biobank study and its associations with schizophrenia and other traits.
- Stdres1_ukbEUR_hrc: Summary statistics of the meat-related diet component (DC1).
- Stdres2_ukbEUR_hrc: Summary statistics of the fish and plant-based related diet component (DC2).
Vallerga et al. 2020, Nature Communications
Title: Analysis of DNA methylation associates the cystine-glutamate antiporter SLC7A11 with risk of Parkinson’s disease
- Vallerga2020_NCOMMS_MWAS-meta-analysis-MOA.mlma: Summary statistics of a meta-analysis of methylome-wide association studies of Parkinson’s disease in the SGPD and PEG cohorts using mixed linear model-based omic association (MOA). Columns are chromosome (Chr), probe (Probe), genomic position of the probe on hg19 (BP), gene name (Gene), orientation (orien), effect size (b), standard error (se) and association p-value (p).
- Vallerga2020_NCOMMS_MWAS-meta-analysis-MOMENT.mlma: Summary statistics of a meta-analysis of methylome-wide association studies of Parkinson’s disease in the SGPD and PEG cohorts using multi component mixed linear model-based omic association excluding the target (MOMENT). Columns are chromosome (Chr), probe (Probe), genomic position of the probe on hg19 (BP), gene name (Gene), orientation (orien), effect size (b), standard error (se) and association p-value (p).
- Vallerga2020_NCOMMS_ancestry.txt: Genetic ancestry of 1885 individuals passing DNA methylation and SNP genotyping quality control (QC). Columns are sample ID (Sample_ID), first principal component (PC1), second principal component (PC2), a flag denoting if the sample passed QC (Passed_QC), a flag denoting European ancestry (European_ancestry), a flag denoting if the sample was including in the final DNA methylation dataset (Final_dataset).
- Vallerga2020_NCOMMS_DNAm-QCed-samples.grm.bin & Vallerga2020_NCOMMS_DNAm-QCed-samples.grm.id & Vallerga2020_NCOMMS_DNAm-QCed-samples.grm.N.bin: Genetic relationship matrix (GRM) based on 7,582,086 SNPs for 1885 of 1889 individuals passing DNA methylation and SNP genotyping quality control (QC). The remaining four individuals failed to pass sample-based genotyping QC. The GRM was used to identify related individuals in the DNA methylation dataset. The final set of 1638 unrelated individuals in the DNA methylation study are identified in Vallerga2020_NCOMMS_ancestry.txt.
Sidorenko and Kassam et al. 2019,
Title: The effect of X-linked dosage compensation on complex trait variation
- UKBv3_Xchr_20traits.tar.gz: Summary statistics of X-chromosome association studies in the UK Biobank for 20 complex traits.
- chrX_eqtl_besd.tar.gz: X-chromosome eQTL summary statistics from the CAGE consortium and GTEx in SMR format.
Wu et al. 2019, Nature Communications
Title: Genome-wide association study of medication-use and associated disease in the UK Biobank
- 23_medication-taking_GWAS_summary_statistics: Summary statistics of genome-wide association studies for 23 medication-taking traits.
- 23_medication-taking_GWAS_summary_statistics_README.pdf: Description of the dataset.
Yap et al. 2018, Nature Communications
Title: Dissection of genetic variation and evidence for pleiotropy in male pattern baldness
- mpb_bolt_lmm_aut_x.tab.zip: Summary statistics for the male pattern baldness (MPB) genome-wide association study.
- MPB_GWAS_summary_statistics_README.pdf: Description of the dataset.
Xue et al. 2018, Nature Communications
Title: Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes
- Xue_et_al_T2D_META_Nat_Commun_2018.gz: Summary statistics of genome-wide association of type 2 diabetes.
- Xue_et_al_T2D_META_Nat_Commun_2018.pdf: Description of the dataset.
Yengo et al. 2018, Human Molecular Genetics
Title: Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of European ancestry
- Meta-analysis_Locke_et_al+UKBiobank_2018.txt.gz: Summary statistics of genome-wide association of body mass index. Columns are chromosome (CHR), genomic position (POS) on the human genome 37 (hg19), SNP rs identifier (SNP), Allele tested for association (Tested_Allele), Alternative allele (Other_Allele), Frequency of the tested allele in the Health and Retirement Study (Freq_Tested_Allele_in_HRS), estimated effect of the tested allele (BETA), estimated standard error of the effect size of the tested allele (SE), Association p-value (P) and sample size (N).
- Meta-analysis_Wood_et_al+UKBiobank_2018.txt.gz: Summary statistics of genome-wide association of height. Columns are chromosome (CHR), genomic position (POS) on the human genome 37 (hg19), SNP rs identifier (SNP), Allele tested for association (Tested_Allele), Alternative allele (Other_Allele), Frequency of the tested allele in the Health and Retirement Study (Freq_Tested_Allele_in_HRS), estimated effect of the tested allele (BETA), estimated standard error of the effect size of the tested allele (SE), Association p-value (P) and sample size (N).
- 610_genes_prioritised_with_SMR_for_height_Yengo_et_al_HMG_2018.txt and 110_genes_prioritised_with_SMR_for_BMI_Yengo_et_al_HMG_2018.txt: Results from Summary-data based Mendelian Randomisation (SMR) aiming at prioritising genes, which local genetic control correlate with that of focus traits in genome-wide association studies of height and Body Mass Index (BMI; Yengo et al. 2018; HMG). These analyses have prioritised 110 and 610 genes associated with BMI and height respectively. Each file contains tables with 21 columns as described described here (http://cnsgenomics.com/software/smr/#SMR&HEIDIanalysis) and last column (“tissue”) indicates in which tissue gene expression was measured. SMR analyses were performed using expression QTL (eQTL) data from the Gtex project.
Zhu et al. 2018, Nature Communications
Title: Causal associations between risk factors and common diseases inferred from GWAS summary data
The summary-level GWAS data for 23 phenotypes were from GERA and UK Biobank. Each data set has been made available as a whitespace-separate table in GCTA-COJO format. Columns are SNP, the effect allele, the other allele, frequency of the effect allele, effect size, standard error, p-value and sample size.
- GERA data: Details of quality controls of the genotyped and imputed data can be found in Zhu et al. (2018 Nat. Commun.). The individual-level ICD-9 codes were classified into 22 common diseases. We added an additional trait ‘Disease Count’ (a count of the number of diseases affecting each individual) as a crude measure of general health status of each individual.
- UK Biobank data: Details of quality controls of the genotyped and imputed data can be found in Zhu et al. (2018 Nat. Commun.). Individual-level ICD-10 codes were available in the UKB data. To match the diseases in GERA, we classified the phenotypes into 22 common diseases by projecting the ICD-10 codes to the classifications of ICD-9 codes in GERA taking into account the self-reported disease status. Note that we did not perform the association analysis for dermatophytosis because the number of cases was too small. We only performed the association analyses on a subset of SNPs (in common with the top associated SNPs for the risk factors) for insomnia, iron deficiency anemias, macular degeneration, peripheral vascular disease and acute reaction to stress.
Marioni et al. 2018, Translational Psychiatry
Title: GWAS on family history of Alzheimer’s disease
- Summary statistics: Summary statistics of a genome-wide association of Alzheimer’s Disease (AD), combining published GWAS results for AD with a GWAS on family history in the UK Biobank. [link for file]: Columns are rs identifier/chr:position_allele1_allele2 (SNP), allele tested for association (A1), alternative allele (A2), estimated effect of A1 (BETA: log-odds), estimated standard error of BETA (SE), association p-value (P), chromosome (CHR), position on the chromosome (BP) and sample size (N). Note that this file has been corrected in March 2019
Lloyd-Jones et al. 2017, American Journal of Human Genetics
Title: The Genetic Architecture of Gene Expression in Peripheral Blood
- See online web app to download a full set of summary statistics or download the Summary Mendelian Randomization binary format at SMR.
Robinson et al. 2015, Nature Genetics
Title: Population genetic differentiation of height and body mass index across Europe
within-family genome-wide association statistics from the sibling pair data generated using the QFAM approach in PLINK. Estimates are for HapMap3 SNPs, which passed QC thresholds of MAF>1%, HWE < 1x10-6, Imputation info score > 0.4. Columns are chromosome, SNP rs identity, the coded allele, the coded allele frequency, effect size, approximate standard error, and permutation p-value from 100,000 permutations.
Corrected height data and Corrected BMI data: These files are a replacement for the file with summary statistics that were publicly released in March 2016. README
The reason for the replacement is that it was recently reported that there were issues with the publicly released within-family SNP effect estimates from the Robinson et al. study. These summary data were publicly released in response to a request from J. Pritchard and colleagues for a study that was published later that year (Field et al., Science Vol. 354, Issue 6313, pp. 760-764, 11 Nov 2016). The request from the Field et al. study authors was for statistics in addition to those produced for use in the Robinson et al. study. We therefore re-ran the within-family analysis using PLINK v1.90b3c 64-bit, whereas the actual within-family estimates used in the published Robinson et al. paper from 2015 was produced using Plink v1.07.
We have now found that Plink software version v1.90b3c contained a bug which can result in a mishandling of family structure and likely no control for population structure within the analysis (the current software version reports this error as now corrected). Results from Plink v1.07 used in Robinson et al. 2015 are replicated when using updated estimates from the latest corrected PLINK versions and when using an independent in-house implementation of sib-regression.
We apologise for the inconvenience caused.
Yang et al. 2015, Nature Genetics
Title: Estimation of genetic variance from imputed sequence variants reveals negligible missing heritability for human height and body mass index
- LDSCORE_release_July2015.tar.gz: per-SNP and per-segment LD scores calculated from 44,126 unrelated indivduals and ~17M imputed variants. Columns are SNP, per-SNP LD score, and per-segment LD score.
- GWAS_summary_release_July2015.tar.gz: GWAS summary data. Columns are SNP, the coded allele, effect size, and standard error.
Hemani et al. 2013, AJHG
Title Inference of the Genetic Architecture Underlying BMI and Height with the Use of 20,240 Sibling Pairs
- hemani_pihat.txt: Estimated genome-wide realised additive (first column) and dominance (second column) coefficients from 20,240 sibling pairs. If the genome-wide average posterior probabilities of sharing 0, 1 and 2 alleles identical-by-descent are P0, P1 and P2, then the first column contains (0.5*P1 + P2) and the second column contains P2.