I have conducted a large-scale GWAS study and got a few significantly associated SNPs. I used GEMMA
with -lmm 1
options to run the GWAS and obtain the beta
and standard-error
estimates. I want to estimate the percent phenotypic variation explained by each of the significant SNPs. I used the following procedure for estimating the variance explained in R:
fit <- lm (Phenotypic_value ~ SNP_data, data = a)
summary(fit)$r.squared
Here, the datafile a
contains three columns namely, sample_ID
, Phenotypic_value
for each sample, and the biallelic SNP_data
. I got a value which is 0.43 meaning 43% phenotypic variation explained by the SNP.
Again, I used another formula which is: 2*f*(1-f)*b.alt^2
. Here, f
is the minor allele frequency and b.alt
is the effect size i.e. beta
estimate obtained from GEMMA
. This gives me a value of 0.03 meaning 3% variation explained which seems reasonable to me.
My question is that which of the following method is correct? or Is there any other way to estimate the percent variation explained?
Alternatively, from the GEMMA
google group, I have got this formula pve <- var(x) * (beta^2 + se^2)/var(y)
. But I do not understand how can I obtain the value of var(x)
and var(y)
.
It will be great to receive some feedback on this. Thank you.