This practical has been adapted from International Statistical Genetics Workshop and the GenomicSEM wiki

This practical is available from https://cnsgenomics.com/data/teaching/GNGWS24/module4

Objective

The objective of this practical is to encourage you to think about how SEM could help you understand trait relationships within your own work.

GenomicSEM is a nice starting place as it only requires summary statistics but there will be a learning curve with the syntax! In this practical we will introduce GenomicSEM and help familiarise the basics.


Recap: GenomicSEM

What is GenomicSEM?

  • Application of SEM to summary level GWAS data for modelling the joint genetic architecture of genetically correlated traits.

  • Can be used to estimate the shared information across correlated polygenic traits either aggregate (genome-wide) or at the individual SNP level.

  • Utilises the genetic variance-covariance matrix estimated from summary-level data using LDSC

(Grotzinger et al. 2019): https://www.nature.com/articles/s41562-019-0566-x



Let’s start with a scenario

We know there is a large degree of comorbidity across psychiatric disorders, in particular major depressive disorder, anxiety, alcohol use and PTSD.

It has been hypothesised the key mechanisms for risk of each of these psychiatric disorders operate through a general factor. This is supported by moderate to high genetic correlations across the 4 disorders.


Given the availability of summary statistics for each of these disorders, we can use GenomicSEM to examine the shared genetic architecture and construct the latent psychiatric factor F above (common factor model). We can then determine if there are any genetic variants associated with the factor by adding individual SNPs to the model (common factor model with individual SNP effects). The final part of the practical introduces you to some more complex models and is an opportunity for you to brainstorm how GenomicSEM (or SEM, more generally) can be informative for the questions you are trying to answer.



Set up

We are going to run this practical using R on the cluster. To log in start by opening the terminal.

You can login using the following command:
ssh <username>@<host>

You should have received a welcome package email that contains your username, host and password.

The data for this practical is located in /data/module4/Practice_3_genomicSEM/data

If you haven’t already you will want to make a directory within your scratch for this module e.g.

mkdir -p /scratch/<username>/module4/Practice_3_genomicSEM

cd /scratch/<username>/module4/Practice_3_genomicSEM

Open R by typing R

# load in the package
library(GenomicSEM)

There will be 25 warnings (that’s expected..)


Data

We will be using the following data

  • Major Depressive Disorder - cases=170k; controls=329k; (Howard et al. 2018)

  • Anxiety Disorders - cases=32k, controls=82; (Purves et al. 2019)

  • Alcohol use disorder - cases=8.5k; controls=20k; (Walters et al. 2018)

  • PTSD - cases=2.5k; controls=7k; (Duncan et al. 2017)

Description of preprocessing for these files is described here

The data will be available in the following directory /data/module4/Practice_3_genomicSEM/data


Things to consider about the data if you intend to run genomicSEM

  • you must use the same ancestral background across your summary statistics AND that this ancestral background must match that of the LD-scores used for the ldsc function described below

  • participant samples used to produce the summary statistics can range from entirely overlapping to entirely independent and you DO NOT need to know the level of overlap in order to run Genomic SEM; this is one of the major advantages of the method.

  • you want the full or very lightly cleaned summary statistics, not pruned or conditionally independent SNPs - this has implications for estimation of variance components (heritability and correlations)



Common factor model

We are going to run the common factor model in the path diagram above. In practice, you would first investigate possible underlying factor structure with methods such as exploratory factor analysis, but here, we are just going to fit the model.

There are three steps to running a common factor model that does not include SNP effects

  1. Munge the summary statistics (munge)

  2. Run LDSC to obtain the genetic covariance and sample covariance matrices (ldsc)

  3. Specify and the run model (usermodel)


Munge the files

The first step in running a Genomic SEM model is to munge the summary statistics. This is akin to harmonising the data. The munge function works to convert the summary statistics to the format expected by LDSC (i.e., on a z-statistic metric). The summary statistics files input into the munge function at a minimum need to contain five pieces of information:

  1. The rsID of the SNP.
  2. An A1 allele column, with A1 indicating the effect allele.
  3. An A2 allele column, with A2 indicating the non-effect allele.
  4. Either a logistic or continuous regression effect.
  5. The p-value associated with this effect.
# 1. files = the name of the summary statistics files
files = c("/data/module4/Practice_3_genomicSEM/data/ALCH_withrsID.txt",
          "/data/module4/Practice_3_genomicSEM/data/SORTED_PTSD_EA9_ALL_study_specific_PCs1.txt", 
          "/data/module4/Practice_3_genomicSEM/data/MDD_withNeff.txt",
          "/data/module4/Practice_3_genomicSEM/data/ANX_withNeff.txt")

#2. hm3 = the name of the reference file to use for alligning effects to same ref allele across traits
hm3 <-"/data/module4/Practice_3_genomicSEM/data/eur_w_ld_chr/w_hm3.snplist"

#3. trait.names = names used to create the .sumstats.gz output files
trait.names<-c("ALCH", "PTSD", "MDD", "ANX")

#4. N = total sample size for traits
# All but PTSD have SNP-specific sum of effective sample sizes so only its sample size is listed here
N=c(NA,5831.346,NA,NA)

#Run the munge function. This creates four .sumstats.gz files (e.g., PTSD.sumstats.gz).
munge(files=files,hm3=hm3, trait.names=trait.names, N=N)

This will take a couple of minutes. Have a read of the output log as this runs. It should give you an idea of what is happening. In general this function is very good at interpreting the files, but you will need to check it has done this correctly!



Run LDSC

The second step is to run multivariable LD-Score regression to obtain the genetic covariance (S) matrix and corresponding sampling covariance matrix (V).

The ldsc function takes 5 necessary arguments:

#1. traits = the name of the .sumstats.gz traits
traits<-c("ALCH.sumstats.gz", "PTSD.sumstats.gz", "MDD.sumstats.gz", "ANX.sumstats.gz")

#2. sample.prev = the proportion of cases to total sample size. For quantitative traits list NA
#here we use .5 because we are using the sum of effective sample size across cohorts
#which is more appropriate for performing the asscertainment correction. 
sample.prev <- c(.5,.5,.5,.5)

#3. population.prev : the population lifetime prevalence of the traits. For quantitative traits list NA
population.prev <-c(.159, .3, .15,.20)

##4. ld: folder of LD scores 
#can be found on the genomicSEM github (https://github.com/GenomicSEM/GenomicSEM)
ld <- "/data/module4/Practice_3_genomicSEM/data/eur_w_ld_chr/"

#5. wld: folder of LD score weights [typically the same as the ld argument]
wld <- "/data/module4/Practice_3_genomicSEM/data/eur_w_ld_chr/"

#6. trait.names: optional sixth  argument to list trait names so they can be named in your model
trait.names<-c("ALCH", "PTSD", "MDD", "ANX")

#Run the ldsc function 
LDSC_INT<-ldsc(traits=traits, sample.prev=sample.prev, population.prev=population.prev, ld=ld, wld=wld,trait.names=trait.names)

##in many cases will want to save the LDSC data for later use. Example code provided here
# save(LDSC_INT, file = "LDSC_INT.RData")

# Running LDSC for all files took 0 minutes and 29 seconds

The output (named LDSC output here) is a list with 5 named variables in it:

  1. LDSCoutput$S is the covariance matrix (on the liability scale for case/control designs).

  2. LDSCoutput$V which is the sampling covariance matrix in the format expected by lavaan.

  3. LDSCoutput$I is the matrix of LDSC intercepts and cross-trait (i.e., bivariate) intercepts.

  4. LDSCoutput$N contains the sample sizes (N) for the heritabilities and sqrt(N1N2) for the co-heritabilities. These are the sample sizes provided in the munging process.

  5. LDSCoutput$m is the number of SNPs used to construct the LD score.

print(LDSC_INT)
## $V
##               [,1]         [,2]         [,3]          [,4]          [,5]
##  [1,] 6.041603e-04 4.630259e-05 1.329209e-05  2.110818e-05  1.699375e-04
##  [2,] 4.630259e-05 1.683117e-03 5.491624e-06  1.012613e-04  7.541207e-05
##  [3,] 1.329209e-05 5.491624e-06 3.692467e-05  4.865869e-05  8.966010e-05
##  [4,] 2.110818e-05 1.012613e-04 4.865869e-05  2.627033e-04  1.244898e-05
##  [5,] 1.699375e-04 7.541207e-05 8.966010e-05  1.244898e-05  1.135989e-02
##  [6,] 1.928842e-05 1.794540e-05 1.773068e-06 -7.693756e-08  4.536925e-05
##  [7,] 5.939947e-06 9.955430e-05 2.035833e-05  1.510451e-05  3.350365e-04
##  [8,] 3.078558e-06 2.224531e-07 6.028298e-06  6.091000e-06 -1.606011e-05
##  [9,] 6.866515e-06 3.119102e-06 9.777927e-06  1.858721e-05  5.419991e-05
## [10,] 1.292371e-05 1.406092e-05 7.764292e-06  3.524720e-05  1.212087e-06
##                [,6]          [,7]          [,8]         [,9]        [,10]
##  [1,]  1.928842e-05  5.939947e-06  3.078558e-06 6.866515e-06 1.292371e-05
##  [2,]  1.794540e-05  9.955430e-05  2.224531e-07 3.119102e-06 1.406092e-05
##  [3,]  1.773068e-06  2.035833e-05  6.028298e-06 9.777927e-06 7.764292e-06
##  [4,] -7.693756e-08  1.510451e-05  6.091000e-06 1.858721e-05 3.524720e-05
##  [5,]  4.536925e-05  3.350365e-04 -1.606011e-05 5.419991e-05 1.212087e-06
##  [6,]  1.336556e-04  1.255636e-04  7.752931e-06 1.314640e-05 3.972090e-05
##  [7,]  1.255636e-04  8.250458e-04 -4.762316e-07 5.242814e-06 3.447767e-05
##  [8,]  7.752931e-06 -4.762316e-07  1.219319e-05 1.921343e-05 1.590190e-05
##  [9,]  1.314640e-05  5.242814e-06  1.921343e-05 5.109221e-05 5.444954e-05
## [10,]  3.972090e-05  3.447767e-05  1.590190e-05 5.444954e-05 2.843342e-04
## 
## $S
##            ALCH       PTSD        MDD        ANX
## [1,] 0.13938790 0.05977947 0.05943021 0.08500236
## [2,] 0.05977947 0.23937808 0.05799439 0.11428679
## [3,] 0.05943021 0.05799439 0.08503281 0.12667327
## [4,] 0.08500236 0.11428679 0.12667327 0.23329360
## 
## $I
##            [,1]        [,2]        [,3]        [,4]
## [1,] 1.01069811 0.023791915 0.015906770 0.017004740
## [2,] 0.02379192 0.993973587 0.004565391 0.003975452
## [3,] 0.01590677 0.004565391 0.994969603 0.262655462
## [4,] 0.01700474 0.003975452 0.262655462 1.007313416
## 
## $N
##          [,1]     [,2]     [,3]     [,4]     [,5]     [,6]     [,7]   [,8]
## [1,] 21647.63 11235.43 96488.81 32830.69 5831.346 49943.61 16951.93 427751
##          [,9]    [,10]
## [1,] 145314.9 49279.86
## 
## $m
## [1] 1173569

How well do you understand the output. Can you use the S matrix to obtain the genetic correlation between MDD and Anxiety?

# Genetic correlation is the genetic covariance between traits scaled by the sqrt of the product of their genetic variances 
# The S matrix contains the genetic variances on the diagonals and the genetic covariances on the off diagonals
LDSC_INT$S[3,4]/sqrt(LDSC_INT$S[3,3]*LDSC_INT$S[4,4])
# This should match the genetic correlation reported in the ldsc output log 


Specify and run the model

We use the lavaan language to specify the model. Here is a quick introduction to the syntax

For example:

We are going to run a common factor model (Model3). An explanation of the output is provided below

#load in the LDSC object made in step 2 above
#load("LDSC_INT.RData")

#usermodel takes two necessary arguments:
#1. covstruc: the output from multivariable ldsc 
covstruc<-LDSC_INT

#2. model: the user specified model
#here we specify a common factor model
INT.model<-"F1=~MDD+PTSD+ALCH+ANX"

#std.lv: optional argument specifying whether variances of latent variables should be set to 1
std.lv=TRUE # Otherwise the first factor loading will be set to 1
# One of the parameters will need to be set to 1 to make our model identifiable 

#Run your model
IntResults <- usermodel(covstruc=covstruc, model=INT.model, std.lv=std.lv)
## [1] "Running primary model"
## [1] "Calculating CFI"
## [1] "Calculating Standardized Results"
## [1] "Calculating SRMR"
## elapsed 
##   0.262
#print the results of your model
IntResults$results
##    lhs op  rhs Unstand_Est         Unstand_SE STD_Genotype    STD_Genotype_SE
## 5   F1 =~  MDD 0.283806747 0.0210027454143111   0.97326125 0.0720249151103093
## 6   F1 =~ PTSD 0.221278068 0.0402049336519291   0.45226835 0.0821745168615321
## 3   F1 =~ ALCH 0.205225639 0.0243218023255069   0.54969157 0.0651453167312673
## 4   F1 =~  ANX 0.445784745 0.0324561138311457   0.92294074 0.0671962594330495
## 1 ALCH ~~ ALCH 0.097270350 0.0258991464601678   0.69783919  0.185806270394915
## 9 PTSD ~~ PTSD 0.190414108  0.106977265855824   0.79545336  0.446896663107149
## 8  MDD ~~  MDD 0.004486541 0.0109566843444685   0.05276254   0.12885240971739
## 2  ANX ~~  ANX 0.034569557 0.0301103372628622   0.14818043  0.129066285483485
## 7   F1 ~~   F1 1.000000000                      1.00000000                   
##      STD_All      p_value
## 5 0.97326125 1.313540e-41
## 6 0.45226834 3.717880e-08
## 3 0.54969157 3.230100e-17
## 4 0.92294072 6.265549e-43
## 1 0.69783918 1.728330e-04
## 9 0.79545335 7.508426e-02
## 8 0.05276254 6.821876e-01
## 2 0.14818043 2.509289e-01
## 7 1.00000000           NA
#print the model fit of your model
IntResults$modelfit
##       chisq df  p_chisq      AIC CFI       SRMR
## df 1.283452  2 0.526383 17.28345   1 0.03621695

This produces an R object, IntResults, that contains the two elements shown above:


IntResults$results prints the unstandardized and standardized results for the common factor model.

  • “F1 =~ ?” lists the indicator loadings on the common factors.

  • “V ~~ V” lists the residual variances of the indicators after removing the variance explained by the common factor.

  • Standardized results (STD_Genotype and STD_Genotype_SE) are obtained from the standardized input (i.e., a genetic correlation matrix) vs unstandardised (Unstand_Est and Unstand_SE) are based on the genetic covariance matrix. This means standardised results are standardized with respect to the genetic variance in your phenotypes. However, these results will not be fully standardized if the latent factor is not itself specified to have a variance of 1.0

    • In the standardized case of a common factor model (“V ~~ V” + “F1 =~ V”^2) will sum to 1.


IntResults$modelfit prints the model chi-square, degrees of freedom, the p-value for the chi-square test, Aikake Information Criterion (AIC), Comparative Fit Index (CFI), and Standardized Root Mean Square Residual (SRMR). This tells us how well the observed covariance matrix fits the implied covariance matrix.


Can you fill out the common factor model diagram above with the factor loadings and variance terms?



These results can be inserted into a path diagram to produce the following:



Common factor model with individual SNP effects

We can now add individual SNP effects onto our factor.

A powerful extension of Genomic SEM is to run a multivariate GWAS for which a common factor defined by genetic indicators is regressed on a SNP. This allows for estimation of a set of summary statistics for the common factor that represent the SNP effects on the common factor.

For this part of the analysis we are going to use a drastically reduced subsets of SNPs so the model doesn’t take too long to run. These data have already been harmonised so we can skip the munge step from before. We will use the covariance matrix we made earlier in the LDSC step, LDSC_INT.

Prepare summary statistics

As we are regressing the factor onto SNPs we need to summary statistics as well as our LDSC covariance matrix.

#Takes four necessary arguments along with other optional arguments that 
#you may need for your use case. Here we need six arguments to run the function 

#1. files = the name of the summary statistics file
##**note that these are a drastically reduced subsets of SNPs for the practical ONLY
##*that reflect a selected set of chromosome 4 variants 
##*Also note that these need to be in the same order as your ldsc object 
files<-c("/data/module4/Practice_3_genomicSEM/data/ALCH4.txt", 
         "/data/module4/Practice_3_genomicSEM/data/PTSD4.txt", 
         "/data/module4/Practice_3_genomicSEM/data/MDD4.txt", 
         "/data/module4/Practice_3_genomicSEM/data/ANX4.txt")

#2. ref = the name of the reference file used to obtain SNP MAF
#**note again that this is a drastically reduced subset of SNPs for chromosome 4
#the full reference set is available on the genomicSEM github (https://github.com/GenomicSEM/GenomicSEM)
ref <- "/data/module4/Practice_3_genomicSEM/data/reference.1000G.ch4.txt" 

#3. trait.names = the name of the traits to use for column labeling
trait.names<-c("ALCH","PTSD", "MDD", "ANX")

#4. se.logit = whether the standard errors are on an logistic scale
se.logit<-c(F,T,T,T)

#5. linprob: whether it was a binary outcome that was analyzed as continuoue -or-
#it is a file with only Z-statistics. This is true for ALCH for our data
linprob<-c(T,F,F,F)

#6. sample size. This is only needed for continuous outcomes or outcomes where linprob is TRUE
#we do not provide sample size for ALCH as it is already a column in the GWAS data 
N<-c(NA,NA,NA,NA)

#run sumstats putting these pieces together 
INT_sumstats<-sumstats(files=files,ref=ref,trait.names=trait.names,se.logit=se.logit,linprob=linprob,N=N)
# Running sumstats for all files took 4 minutes and 52.6425039768219 seconds


#would typically the output for future use
#write.csv(INT_sumstats, file = "INT4_sumstats.csv")

Specify and run the model

To add a SNP to our common factor model we include a single line that specifies we want to regress the factor on a SNP by adding F1~SNP to the model command. The model is then run for each SNP individually.

We specify for the model output to only save the regression of the factor on the SNP using the sub option (otherwise, the output will contain all factor loading and variance terms estimated for each SNP).

#userGWAS takes three necessary arguments:
#1. covstruc = the output from the ldsc function
covstruc<-LDSC_INT

#2. SNPs = output from sumstats function
SNPs<-INT_sumstats

#3. model = the model to be run
#going to troubleshoot estimated ov variances are negative for 4 SNPs 
#by adding model constraint for all residuals to be above 0
model<-"F1=~MDD+PTSD+ALCH+ANX
F1~SNP"

#4. sub = optional argument specifying component of model output to be saved
sub<-"F1~SNP"

#5. parallel = optional argument specifying whether it should be run in parallel
#set to FALSE here just for the practical
parallel<-FALSE

#6. Q_SNP = optional argument specifying whether you want 
#the heterogeneity index calculated for your factors
Q_SNP<-TRUE

#run the multivariate GWAS below
INT_GWAS<-userGWAS(covstruc=covstruc, SNPs=SNPs, model=model, sub=sub, parallel=parallel, Q_SNP=Q_SNP)
# sub model means we only get the factor loading and variance of the SNP

Lets look at the results

##print the first five rows of the userGWAS output
INT_GWAS[[1]][1:5,]
##          SNP CHR    BP       MAF A1 A2 lhs op rhs free label          est
## 1 rs10030871   4 68786 0.0765408  T  C  F1  ~ SNP    6       9.072853e-04
## 2  rs6599368   4 69567 0.0755467  A  T  F1  ~ SNP    6       1.009834e-03
## 3  rs7678633   4 69713 0.0755467  G  A  F1  ~ SNP    6       1.051505e-03
## 4 rs13130581   4 70392 0.0725646  A  G  F1  ~ SNP    6       8.300959e-05
## 5 rs13125929   4 71566 0.0725646  T  C  F1  ~ SNP    6       4.857341e-04
##            SE Z_Estimate Pval_Estimate    chisq chisq_df chisq_pval      AIC
## 1 0.004121185 0.22015156     0.8257531 4.905573        8  0.7676193 18.90557
## 2 0.004110347 0.24568102     0.8059292 4.839512        8  0.7745835 18.83951
## 3 0.004106863 0.25603613     0.7979229 4.901614        8  0.7680382 18.90161
## 4 0.004146428 0.02001954     0.9840278 5.029483        8  0.7544203 19.02948
## 5 0.004153705 0.11693995     0.9069076 5.787671        8  0.6710020 19.78767
##      Q_SNP Q_SNP_df Q_SNP_pval error warning
## 1 3.622121        3  0.3052654     0       0
## 2 3.556059        3  0.3135639     0       0
## 3 3.618163        3  0.3057570     0       0
## 4 3.746033        3  0.2902261     0       0
## 5 4.504216        3  0.2119145     0       0

We essentially have a GWAS of our latent factor

#examine warnings; (0 = good to go)
table(INT_GWAS[[1]]$warning)
## 
##    0 
## 1000
#examine errors; (0 = good to go)
table(INT_GWAS[[1]]$error)
## 
##    0 
## 1000

Are there any genome-wide significant factor hits?

#any genome-wide significant factor hits?
table(INT_GWAS[[1]]$Pval_Estimate < 5e-8)
## 
## FALSE 
##  1000

What about genome-wide significant Q_SNP hits? This is a test of whether the effect of the SNP operates through the common factor or on the individual traits. It can serve as a quality control metric to remove SNPs that do not operate via the factor. In that sense, it also identifies trait-specific SNPs relative to the other traits that load on a factor.

#genome-wide significant Q_hits?
table(INT_GWAS[[1]]$Q_SNP_pval < 5e-8)

#trending significant Q_hits? 
table(INT_GWAS[[1]]$Q_SNP_pval < 1e-5)

#What are these hits (it would be interesting to look at what gene region they are in?)
INT_GWAS[[1]][(INT_GWAS[[1]]$Q_SNP_pval < 1e-5)==TRUE,]




How could you use GenomicSEM?


Below are two examples of more complex models that can be fit using GenomicSEM.


Optional exercise

Take a couple of minutes to chat with the people on your desk about the projects you are working on at the moment. Can you conceive of any ways in which GenomicSEM (or SEM) could be applied within your own projects? Perhaps you can draw this as a path diagram?

For example: Understanding how risk factors contribute to disease? Do they operate through shared pathways, or do they act on the disease through different genetic mechanisms?


Common factor models with multiple factors

We can use the common factor model to understand the underlying structure among different traits.



GWAS by subtraction

This model can be used to explore how genetically correlated traits differ. For example, educational attainment and cognitive performance are strongly genetically correlated. What contributes to the genetic component of educational attainment other than cognitive ability? We can subtract out the influence of cognitive performance from educational attainment to construct a latent variable Non-cog. Adding a SNP to this model allows us to understand the individual genetic components that contribute to Non-cog.

(Demange et al. 2021) https://www.nature.com/articles/s41588-020-00754-2

Code tutorial



References

Demange, Perline A., Margherita Malanchini, Travis T. Mallard, Pietro Biroli, Simon R. Cox, Andrew D. Grotzinger, Elliot M. Tucker-Drob, et al. 2021. “Investigating the Genetic Architecture of Noncognitive Skills Using GWAS-by-Subtraction.” Nature Genetics 53 (1): 35–44. https://doi.org/10.1038/s41588-020-00754-2.
Duncan, L E, A Ratanatharathorn, A E Aiello, L M Almli, A B Amstadter, A E Ashley-Koch, D G Baker, et al. 2017. “Largest GWAS of PTSD (N=20070) Yields Genetic Overlap with Schizophrenia and Sex Differences in Heritability.” Molecular Psychiatry 23 (3): 666–73. https://doi.org/10.1038/mp.2017.77.
Grotzinger, Andrew D., Mijke Rhemtulla, Ronald de Vlaming, Stuart J. Ritchie, Travis T. Mallard, W. David Hill, Hill F. Ip, et al. 2019. “Genomic Structural Equation Modelling Provides Insights into the Multivariate Genetic Architecture of Complex Traits.” Nature Human Behaviour 3 (5): 513–25. https://doi.org/10.1038/s41562-019-0566-x.
Howard, David M., Mark J. Adams, Masoud Shirali, Toni-Kim Clarke, Riccardo E. Marioni, Gail Davies, Jonathan R. I. Coleman, et al. 2018. “Genome-Wide Association Study of Depression Phenotypes in UK Biobank Identifies Variants in Excitatory Synaptic Pathways.” Nature Communications 9 (1). https://doi.org/10.1038/s41467-018-03819-3.
Purves, Kirstin L., Jonathan R. I. Coleman, Sandra M. Meier, Christopher Rayner, Katrina A. S. Davis, Rosa Cheesman, Marie Bækvad-Hansen, et al. 2019. “A Major Role for Common Genetic Variation in Anxiety Disorders.” Molecular Psychiatry 25 (12): 3292–3303. https://doi.org/10.1038/s41380-019-0559-1.
Walters, Raymond K., Renato Polimanti, Emma C. Johnson, Jeanette N. McClintick, Mark J. Adams, Amy E. Adkins, Fazil Aliev, et al. 2018. “Transancestral GWAS of Alcohol Dependence Reveals Common Genetic Underpinnings with Psychiatric Disorders.” Nature Neuroscience 21 (12): 1656–69. https://doi.org/10.1038/s41593-018-0275-1.