Data | Program in Complex Trait Genomics

The following are links to download summary data for papers written within the centre, listed in reverse chronological order.

Wang et al. 2026 Nature Communications

Title: Distinct genetic profiles influence body mass index between infancy and adolescence
Summary statistics associated with the publication: https://doi.org/10.1038/s41467-026-69310-6

ALSPAC_RRM_int.txt.gz: Summary statistics for GWAS of the individual intercept (random effect) of body mass index trajectory estimated by the random regression model in unrelated ALSPAC individuals with European ancestry.
ALSPAC_RRM_slope.txt.gz: Summary statistics for GWAS of the individual linear slope (random effect) of body mass index trajectory estimated by the random regression model in unrelated ALSPAC individuals with European ancestry.
ALSPAC_RRM_slope2.txt.gz: Summary statistics for GWAS of the individual quadratic slope (random effect) of body mass index trajectory estimated by the random regression model in unrelated ALSPAC individuals with European ancestry.
ALSPAC_RRM_eigen1.txt.gz: Summary statistics for GWAS of the first principal component of body mass index trajectory estimated by the random regression model in unrelated ALSPAC individuals with European ancestry.
ALSPAC_RRM_eigen2.txt.gz: Summary statistics for GWAS of the second principal component of the body mass index trajectory estimated by the random regression model in unrelated ALSPAC individuals with European ancestry.

Kemper et al. 2024 Nature Communications

Title: Genetic influence on within-person longitudinal change in anthropometric traits in the UK Biobank
Summary statistics associated with the publication: https://www.nature.com/articles/s41467-024-47802-7

bmiRateChange_ageCorrected.fastGWA.gz: Summary statistics for age (and sex) corrected rate of change in BMI (kg/m2 per year) in unrelated UK Biobank individuals with European ancestry.
heightRateChange_ageCorrected.fastGWA.gz: Summary statistics for age (and sex) corrected rate of change in height (cm per year) in unrelated UK Biobank individuals with European ancestry.
sitRateChange_ageCorrected.fastGWA.gz: Summary statistics for age (and sex) corrected rate of change in sitting height (cm per year) in unrelated UK Biobank individuals with European ancestry.
weightRateChange_ageCorrected.fastGWA.gz: Summary statistics for age (and sex) corrected rate of change in weight (kg per year) in unrelated UK Biobank individuals with European ancestry.
caseControl_repeatMeasure_singleMeasure.fastGWA.gz: Summary statistics from a case:control analysis to determine degree of bias in individuals returning for repeated measures in the UK Biobank. All individuals are unrelated and of European ancestry.

Wang et al. 2023 PLOS Genetics

Title: Cross-ancestry analyses identify new genetic loci associated > with 25-hydroxyvitamin D
Summary statistics associated with the publication: tba

Wang_2023_25OHD_Readme.pdf: Description of the dataset.
Wang_2023_25OHD_AFR.gz: Summary statistics of 25OHD GWAS in UK Biobank participants of inferred African ancestry.
Wang_2023_25OHD_EAS.gz: Summary statistics of 25OHD GWAS in UK Biobank participants of inferred East Asian ancestry.
Wang_2023_25OHD_AFR.gz: Summary statistics of 25OHD GWAS in UK Biobank participants of inferred South Asian ancestry.
Wang_2023_25OHD_EUR_DomGWAS.gz: Summary statistics of a dominance GWAS of 25OHD in unrelated individuals of inferred European ancestry.
Wang_2023_25OHD_EUR_Dark_skin.gz: Summary statistics of 25OHD GWAS in UK Biobank participants of inferred European ancestry and self-reported dark skin colour.
Wang_2023_25OHD_EUR_Light_skin.gz: Summary statistics of 25OHD GWAS in UK Biobank participants of inferred European ancestry and self-reported light skin colour.
Wang_2023_25OHD_EUR_SCS_meta.gz: Results from a fixed-effect inverse-variance weighted meta-analysis of the skin colour stratified GWAS.

Wu et al. 2023

Title: GWAS summary statistics and SNP weights for polygenic score calculation
Summary statistics associated with the publication: wu_et_al_2023_SummaryStatistics.pdf

AgeO_EUR: GWAS summary statistics for AgeO generated based on individuals of European ancestry.
DivD_EUR: GWAS summary statistics for DivD generated based on individuals of European ancestry.
SNPWeights: SNP weights of DivD-EUR for polygenic score calculation.

Wu et al. 2021

Title: GWAS summary statistics for peptic ulcer disease and other gastrointestinal disorders
Summary statistics associated with the publication: https://cnsgenomics.com/sites/default/files/PCTGWebsiteSummaryStatistics.pdf

PUD_summary: peptic ulcer disease (PUD)
GORD_summary: gastro-oesophageal reflux disease (GORD)
PGM_summary: combinations of PUD, GORD and their medications (PG+M)
IBS_summary: irritable bowel syndrome (IBS)
IBD_summary: inflammatory bowel diseases (IBD)

Couvy-Duchesne et al. 2020

Title: A unified framework for association and prediction from vertex‐wise grey‐matter structure
Summary statistics associated with the publication : https://www.nature.com/articles/s41525-020-0118-3

BLUP_weights_BslnCovariates.zip: BLUP weights calculated from the UK Biobank (first 10K individuals), that can be used to generate grey-matter based prediction on independent cohorts.

Nabais et al. 2020

Title: Significant out-of-sample classification from methylation profile scoring for amyotrophic lateral sclerosis
Summary statistics from: https://www.nature.com/articles/s41525-020-0118-3

AUS_ALS_PCTG_qced_normalized_DNAm_autosomes_adjusted_no-xreact-no-SNP-sd0.02_MOA.mlma: MOA summary statistics from discovery Australian ALS cohort.
AUS_ALS_PCTG_qced_normalized_DNAm_autosomes_adjusted_no-xreact-no-SNP-sd0.02_MOMENT.mlma: MOMENT summary statistics from discovery Australian ALS cohort.
Netherlands_ALS_qced_normalized_autosomes_adjusted_no-xreact-no-SNP-sd0.02_MOMENT.mlma: MOMENT summary statistics from replication Netherlands ALS cohort.

Revez et al. 2020, Nature Communications

Title: Genome-wide association study identifies 143 loci associated with 25 hydroxyvitamin D concentration
Summary statistics from: https://www.nature.com/articles/s41467-020-15421-7

Revezetal2020_25OHD_Readme.pdf: Description of the dataset.
Revezetal2020_25OHD.gz: Summary statistics of the 25 hydroxyvitamin D (25OHD) genome-wide association study (GWAS) with no BMI correction.
Revezetal2020_25OHD_BMIcov.gz: Summary statistics of 25OHD GWAS with BMI as covariate.
Revezetal2020_25OHD_BMIcond.gz: Summary statistics of 25OHD GWAS conditioned on BMI with mtCOJO.
Revezetal2020_25OHD_log: Summary statistics of 25OHD GWAS used in meta-analysis with summary statistics of the SUNLIGHT consortium.
Revezetal2020_25OHD_SUNLIGHTmeta: Summary statistics of the meta-analysis between the UKB 25OHD GWAS and the SUNLIGHT consortium GWAS.
smr_epi_plots.zip: omics SMR plots.

Niarchou et al. 2020 Translational Psychiatry

Title: Genome-wide association study of dietary intake in the UK biobank study and its associations with schizophrenia and other traits

Stdres1_ukbEUR_hrc: Summary statistics of the meat-related diet component (DC1).
Stdres2_ukbEUR_hrc: Summary statistics of the fish and plant-based related diet component (DC2).

Vallerga et al. 2020, Nature Communications

Title: Analysis of DNA methylation associates the cystine-glutamate antiporter SLC7A11 with risk of Parkinson’s disease

Vallerga2020_NCOMMS_MWAS-meta-analysis-MOA.mlma: Summary statistics of a meta-analysis of methylome-wide association studies of Parkinson’s disease in the SGPD and PEG cohorts using mixed linear model-based omic association (MOA). Columns are chromosome (Chr), probe (Probe), genomic position of the probe on hg19 (BP), gene name (Gene), orientation (orien), effect size (b), standard error (se) and association p-value (p).
Vallerga2020_NCOMMS_MWAS-meta-analysis-MOMENT.mlma: Summary statistics of a meta-analysis of methylome-wide association studies of Parkinson’s disease in the SGPD and PEG cohorts using multi component mixed linear model-based omic association excluding the target (MOMENT). Columns are chromosome (Chr), probe (Probe), genomic position of the probe on hg19 (BP), gene name (Gene), orientation (orien), effect size (b), standard error (se) and association p-value (p).
Vallerga2020_NCOMMS_ancestry.txt: Genetic ancestry of 1885 individuals passing DNA methylation and SNP genotyping quality control (QC). Columns are sample ID (Sample_ID), first principal component (PC1), second principal component (PC2), a flag denoting if the sample passed QC (Passed_QC), a flag denoting European ancestry (European_ancestry), a flag denoting if the sample was including in the final DNA methylation dataset (Final_dataset).
Vallerga2020_NCOMMS_DNAm-QCed-samples.grm.bin & Vallerga2020_NCOMMS_DNAm-QCed-samples.grm.id & Vallerga2020_NCOMMS_DNAm-QCed-samples.grm.N.bin: Genetic relationship matrix (GRM) based on 7,582,086 SNPs for 1885 of 1889 individuals passing DNA methylation and SNP genotyping quality control (QC). The remaining four individuals failed to pass sample-based genotyping QC. The GRM was used to identify related individuals in the DNA methylation dataset. The final set of 1638 unrelated individuals in the DNA methylation study are identified in Vallerga2020_NCOMMS_ancestry.txt.

Abdellaoui et al. 2019

Title: Supplementary Animations for “Genetic Correlates of Social Stratification in Great Britain"

In Abdellaoui et al (2019), we looked at the geographic distribution of human DNA in Great Britain using the UK Biobank dataset (N ~450,000 people of European descent). This dataset provided us with the statistical power to look beyond the expected strong relationship between geography and ancestry to the thus far unexplored relationship between geography and complex trait variation. We analyzed and discussed the geographic distribution of many genome-wide aggregate measures, more than we could display in the article. We have now visualized them in the animations here.

Sidorenko and Kassam et al. 2019

Title: The effect of X-linked dosage compensation on complex trait variation

UKBv3_Xchr_20traits.tar.gz: Summary statistics of X-chromosome association studies in the UK Biobank for 20 complex traits.
chrX_eqtl_besd.tar.gz: X-chromosome eQTL summary statistics from the CAGE consortium and GTEx in SMR format.

Wu et al. 2019, Nature Communications

Title: Genome-wide association study of medication-use and associated disease in the UK Biobank

23_medication-taking_GWAS_summary_statistics: Summary statistics of genome-wide association studies for 23 medication-taking traits.
23_medication-taking_GWAS_summary_statistics_README.pdf: Description of the dataset.

Yap et al. 2018, Nature Communications

Title: Dissection of genetic variation and evidence for pleiotropy in male pattern baldness

mpb_bolt_lmm_aut_x.tab.zip: Summary statistics for the male pattern baldness (MPB) genome-wide association study.
MPB_GWAS_summary_statistics_README.pdf: Description of the dataset.

Xue et al. 2018, Nature Communications

Title: Genome-wide association analyses identify 143 risk variants and putative regulatory mechanisms for type 2 diabetes

Xue_et_al_T2D_META_Nat_Commun_2018.gz: Summary statistics of genome-wide association of type 2 diabetes.
Xue_et_al_T2D_META_Nat_Commun_2018.pdf: Description of the dataset.

Yengo et al. 2018, Human Molecular Genetics

Title: Meta-analysis of genome-wide association studies for height and body mass index in ~700,000 individuals of European ancestry

Meta-analysis_Locke_et_al+UKBiobank_2018.txt.gz: Summary statistics of genome-wide association of body mass index. Columns are chromosome (CHR), genomic position (POS) on the human genome 37 (hg19), SNP rs identifier (SNP), Allele tested for association (Tested_Allele), Alternative allele (Other_Allele), Frequency of the tested allele in the Health and Retirement Study (Freq_Tested_Allele_in_HRS), estimated effect of the tested allele (BETA), estimated standard error of the effect size of the tested allele (SE), Association p-value (P) and sample size (N).
Meta-analysis_Wood_et_al+UKBiobank_2018.txt.gz: Summary statistics of genome-wide association of height. Columns are chromosome (CHR), genomic position (POS) on the human genome 37 (hg19), SNP rs identifier (SNP), Allele tested for association (Tested_Allele), Alternative allele (Other_Allele), Frequency of the tested allele in the Health and Retirement Study (Freq_Tested_Allele_in_HRS), estimated effect of the tested allele (BETA), estimated standard error of the effect size of the tested allele (SE), Association p-value (P) and sample size (N).
610_genes_prioritised_with_SMR_for_height_Yengo_et_al_HMG_2018.txt and 110_genes_prioritised_with_SMR_for_BMI_Yengo_et_al_HMG_2018.txt: Results from Summary-data based Mendelian Randomisation (SMR) aiming at prioritising genes, which local genetic control correlate with that of focus traits in genome-wide association studies of height and Body Mass Index (BMI; Yengo et al. 2018; HMG). These analyses have prioritised 110 and 610 genes associated with BMI and height respectively. Each file contains tables with 21 columns as described described here (http://cnsgenomics.com/software/smr/#SMR&HEIDIanalysis) and last column (“tissue”) indicates in which tissue gene expression was measured. SMR analyses were performed using expression QTL (eQTL) data from the Gtex project.

Zhu et al. 2018, Nature Communications

Title: Causal associations between risk factors and common diseases inferred from GWAS summary data

The summary-level GWAS data for 23 phenotypes were from GERA and UK Biobank. Each data set has been made available as a whitespace-separate table in GCTA-COJO format. Columns are SNP, the effect allele, the other allele, frequency of the effect allele, effect size, standard error, p-value and sample size.

GERA data: Details of quality controls of the genotyped and imputed data can be found in Zhu et al. (2018 Nat. Commun.). The individual-level ICD-9 codes were classified into 22 common diseases. We added an additional trait ‘Disease Count’ (a count of the number of diseases affecting each individual) as a crude measure of general health status of each individual.
UK Biobank data: Details of quality controls of the genotyped and imputed data can be found in Zhu et al. (2018 Nat. Commun.). Individual-level ICD-10 codes were available in the UKB data. To match the diseases in GERA, we classified the phenotypes into 22 common diseases by projecting the ICD-10 codes to the classifications of ICD-9 codes in GERA taking into account the self-reported disease status. Note that we did not perform the association analysis for dermatophytosis because the number of cases was too small. We only performed the association analyses on a subset of SNPs (in common with the top associated SNPs for the risk factors) for insomnia, iron deficiency anemias, macular degeneration, peripheral vascular disease and acute reaction to stress.

Marioni et al. 2018, Translational Psychiatry

Title: GWAS on family history of Alzheimer’s disease

Summary statistics: Summary statistics of a genome-wide association of Alzheimer’s Disease (AD), combining published GWAS results for AD with a GWAS on family history in the UK Biobank. [link for file]: Columns are rs identifier/chr:position_allele1_allele2 (SNP), allele tested for association (A1), alternative allele (A2), estimated effect of A1 (BETA: log-odds), estimated standard error of BETA (SE), association p-value (P), chromosome (CHR), position on the chromosome (BP) and sample size (N). Note that this file has been corrected in March 2019

Sanjak et al., 2017

Title: Evidence of directional and stabilizing selection in contemporary humans

Summary statistics associated with the publication: https://www.pnas.org/doi/full/10.1073/pnas.1707227114

rLRS_plink_HM3_unrel.zip: Summary statistics from GWAS analysis of rLRS in the UKB dataset of unrelated males and females (n~158,000 females and n~116,000 males).

Benyamin et al 2017, Nature Communications

Title: Cross-ethnic meta-analysis identifies association of the GPX3-TNIP1 locus with amyotrophic lateral sclerosis

BenyaminEtAl_NatComm_Data.zip

DOI: 10.1038/s41467-017-00471-1

Benyamin et al 2017, Nature Communications

Title: Cross-ethnic meta-analysis identifies association of the GPX3-TNIP1 locus with amyotrophic lateral sclerosis

BenyaminEtAl_NatComm_Data.zip

DOI: 10.1038/s41467-017-00471-1

Lloyd-Jones et al. 2017, American Journal of Human Genetics

Title: The Genetic Architecture of Gene Expression in Peripheral Blood

See online web app to download a full set of summary statistics or download the Summary Mendelian Randomization binary format at SMR.

shah et al. 2015

Title: Improving Phenotypic Prediction by Combining Genetic and Epigenetic Associations

BMI_EWAS_results.txt.gz: Contains BMI EWAS summary data from the LifeLines cohort.
bmi_EWAS_results_cellCorrected_noCpgSNPprobes.txt.gz: Contains BMI EWAS summary data from the Lothian Birth Cohorts used in the 2015 Shah et al AJHG paper (PMID 26119815). Briefly, beta values were logit transformed: log (beta/(1 − beta). For removal of variation due to batch effects and covariates, the logit-transformed beta values were regressed onto the technical variables (plate, array, and array position) and covariates (sex, age, measured cell count). Residuals from this linear regression were inverse-normal transformed and used in all subsequent analyses. Please refer to paper for data QC.

Robinson et al. 2015, Nature Genetics

Title: Population genetic differentiation of height and body mass index across Europe

withinfam_summary_ht_bmi_release_March2016.tar.gz: within-family genome-wide association statistics from the sibling pair data generated using the QFAM approach in PLINK. Estimates are for HapMap3 SNPs, which passed QC thresholds of MAF>1%, HWE < 1x10-6, Imputation info score > 0.4. Columns are chromosome, SNP rs identity, the coded allele, the coded allele frequency, effect size, approximate standard error, and permutation p-value from 100,000 permutations.
Corrected height data and Corrected BMI data: These files are a replacement for the file with summary statistics that were publicly released in March 2016. README

The reason for the replacement is that it was recently reported that there were issues with the publicly released within-family SNP effect estimates from the Robinson et al. study. These summary data were publicly released in response to a request from J. Pritchard and colleagues for a study that was published later that year (Field et al., Science Vol. 354, Issue 6313, pp. 760-764, 11 Nov 2016). The request from the Field et al. study authors was for statistics in addition to those produced for use in the Robinson et al. study. We therefore re-ran the within-family analysis using PLINK v1.90b3c 64-bit, whereas the actual within-family estimates used in the published Robinson et al. paper from 2015 was produced using Plink v1.07.

We have now found that Plink software version v1.90b3c contained a bug which can result in a mishandling of family structure and likely no control for population structure within the analysis (the current software version reports this error as now corrected). Results from Plink v1.07 used in Robinson et al. 2015 are replicated when using updated estimates from the latest corrected PLINK versions and when using an independent in-house implementation of sib-regression.

We apologise for the inconvenience caused.

Yang et al. 2015, Nature Genetics

Title: Estimation of genetic variance from imputed sequence variants reveals negligible missing heritability for human height and body mass index

LDSCORE_release_July2015.tar.gz: per-SNP and per-segment LD scores calculated from 44,126 unrelated indivduals and ~17M imputed variants. Columns are SNP, per-SNP LD score, and per-segment LD score.
GWAS_summary_release_July2015.tar.gz: GWAS summary data. Columns are SNP, the coded allele, effect size, and standard error.

Hemani et al. 2013, AJHG

Title Inference of the Genetic Architecture Underlying BMI and Height with the Use of 20,240 Sibling Pairs

hemani_pihat.txt: Estimated genome-wide realised additive (first column) and dominance (second column) coefficients from 20,240 sibling pairs. If the genome-wide average posterior probabilities of sharing 0, 1 and 2 alleles identical-by-descent are P0, P1 and P2, then the first column contains (0.5*P1 + P2) and the second column contains P2.