Background
Compositional data ($x_i>0, \sum_i x_i=c$) are usually analyzed using some kind of log-transformation (alr/clr/ilr), to take into account naturally the fact that, in presence of the sum constraint, only the scaling of data values are of importance (see here, here and also this answer). A particular obstacle to this approch is posed by the data containing zeros, which requires imputation of these zeros using one of the available imputation strategies - see here. A less inhibiting difficulty is having to work on a simplex, with non-intuitive geometry (although I suppose the intuition appears with practice).
Possible alternative
As a possible alternative is performing a square root transformation, $y_i=\sqrt{x_i}$, which results in the data points being confined to a sphere, $\sum_i y_i^2=c$. One can then introduce distance as cosine distance (or geodesic distance), which permits application of other statistical techniques. The advantages are:
- spherical geometry (particularly the distances in this geometry) is more intuitive
- there is no need for imputation of zeros
- that $y_i^2>$ may facilitate application of methods/algorithms (statistical or numerical) that do not preserve $y_i$ positive.
The particular disadvantage is not taking directly into account the scaling nature of the data.
Question
What are the particular pitfalls that I may be missing here (but which explain why this approach does not seem broadly used)? Are there alternative ways of dealing with compositional data containing zeros?
Update
Another particularity of this approach is that the distance is bounded from above - this could limit application of some statistical techniques, where the variables should be able to span the whole real axis.
Example
The particular application that I have in mind is the gene count data originating from metagenomic analyses of bacterial species. These typically come in the form of count tables, giving a number of times a gene was detected in a sample (more precisely, the number of sequencing reads that mapped onto this gene). The sum over all the genes is referred to as sequencing depth. The zeros may appear either because the corresponding genes are absent from the sample (i.e., the species carrying these genes are absent) or because the genes (species) are present in a very low concentration, undetected at the given sequencing depth.
Update 2
Table 1 of this article presents a range of distance measures used for the analysis of microbial data. In particular, it povides alongside the Aitchison distance the hypesphere-base measures, such as Battacharyya and Hellinger distances.