This runs on the GGWS server.
Create a directory (within your home directory) where to run the practical
cd
cd module3
mkdir prac4/
cd prac4
Load the practical R
objects.
load("/data/module3/Prac4_binary/dataForPrac4.RData")
We provide an R object named PHENOMAT, which contains N=10,000 rows corresponding to N independent families from the population and 4 columns corresponding to fathers, mothers, sisters and brothers from these families. Each entry (Y_ij) corresponds to the measured trait of individual j (j=1 for fathers, 2 for mothers, 3 for sisters and 4 for brothers) in family i=1,…,N.
Question 1. Calculate the phenotypic correlation between siblings, as well as hte regression of the average phenotype of siblings onto the average phenotypes of parents. What can you say about the heritability of that trait under the assumption that parents are unrelated (i.e. independent)? Are these results consistent? If not then what could explain this inconsistency? How can you verify your assumptions?
correlations_between_all_pairs <- cor(PHENOMAT)
midParent <- 0.5*(PHENOMAT[,"FATHER"] + PHENOMAT[,"MOTHER"])
midChild <- 0.5*(PHENOMAT[,"SISTER"] + PHENOMAT[,"BROTHER"])
Compare
lm(midChild~midParent)
and
cor(PHENOMAT[,"SISTER"],PHENOMAT[,"BROTHER"])
Question 2. We now assume that the trait in question defines the liability of a certain disease in the population. If the prevalence for that disease is K=5%, then what can you say about the absolute risk (KR) of developing the disease among individuals with an affected brother. To answer that question, we would assume that all resemblance between relatives is due to additive genetic effects.
K=0.05
t=qnorm(1-K)
z=dnorm(t)
i=z/K
k=i*(i-t)
aR=0.5 # coefficient of co-ancestry (genetic relatedness) between siblings
tR=(t-aR*i*h2)/sqrt(1-aR*aR*h2*h2*k)
KR=1-pnorm(tR)
Question 3. Using the threshold (t) above which individuals are classified as diseased, estimate KR empirically for siblings pairs and for mothers-daughters pairs using the data provided in PHENOMAT (Tip: KR = P(Y>t|Y_R>t) = P(Y>t and Y_R>t)/P(Y_R>t) = P(Y>t and Y_R>t)/K ). How does it compare with your results from Question 2?
We provide a genomic relationship matrix (GRM) across 3000 individuals and a phenotype file with 1 quantitative trait (assumed to be an observed liability to disease) and 9 binary traits. These binary traits are obtained from discretizing the quantitative trait under a liability threshold model. The first binary trait corresponds to a prevalence K=0.1, the second to K=0.2,…, and the last one to a prevalence K=0.9.
Use GCTA to estimate the heritability of the observed liability
gcta64 --grm /data/module3/Prac4_binary/myData --pheno /data/module3/Prac4_binary/myData.phen \
--mpheno 1 --reml --out observed_liability
Estimate the heritability on the observed (0/1) scale of the first binary trait
gcta64 --grm /data/module3/Prac4_binary/myData --pheno /data/module3/Prac4_binary/myData.phen \
--mpheno 2 --reml --out observed_scale_K_0.1`
Re-estimate the heritability of the first binary
trait but now on the liability scale using GCTA option
--prevalence
gcta64 --grm /data/module3/Prac4_binary/myData --pheno /data/module3/Prac4_binary/myData.phen \
--mpheno 2 --prevalence 0.1 --reml --out observed_liability_scale_K_0.1
Verify the formula \(h^2_l = h^2_o \left(
\frac{K(1-K)}{z^2(K)} \right)\), where
z=dnorm(qnorm(1-K))
.
Estimate the heritability on the observed (0/1) scale and the on the liability scale for all remaining binary traits. Plot these estimates against each other.