Can anyone explain the theoretical consequences of a traditional variance stabilizing transformation such as sqrt(lambda) for the Poisson versus projection to a normal distribution and the pros and cons of each? I am familiar with the concepts of the traditional square root transformation, but I came across this "projection to normality" in a paper (updated below). After digging through their code i understand how they actually perform the transform:
Calculate mu and sigma for the vector. Convert the data vector to its percentiles (I am guessing from the ecdf somehow).
Use the inverse CDF of the normal with the specified mu and sigma to transform the percentiles to normal variates.
The paper is from PLOS Comp Bio. They were using a glasso type approach to model gene expression networks from RNAseq data which are usually counts. Specifically in the methods section they say "Normalization of Data For each read count ni in each sample, we computed the normalized read count ri = log2(2 + C ⋅ ni/n) ........ Because GMRFs are designed for Gaussian data, we projected all samples for each transcript for each tissue onto a Gaussian with variance 1."
MATLAB code from the paper for the transformation:
function v=gaussianProject(x)
%%projects x onto a Gaussian v.
p=percentile(x);
p(p==1)=.99;%otherwise these values get sent to infinity.
mu=mean(x);
sigma=std(x);
if sigma==0
sigma=1;
end
v=norminv(p, mu, sigma);
end
I am planning to run some simulations in R to see how the comparisons work, but would be really grateful if anyone could give any theoretical explanations.