I've recently got some insights about compositional data analysis, wondering whether it could be suitable for the framework I'm currently in.
Recently, I've been very interested in trying to find some useful representations of chemical compounds that could be suitable for running Machine Learning algorithms.
This has lead me considering the field of compositional data analysis, but the only perplexity that I have, is that on the definition of the Simplex given on CoDa resources, all the coordinates must be strictly greater than 0, while in my case, a usual representation given for a certain compound is something like (E.g.):
NaCl : (... ,0 , 0, 0.5, 0, 0, ..., 0.5, ...)
Basically, given an enumeration of periodic table, I consider the fractional abundance of each element in the compound (in this case 1/2 of Na and 1/2 of Cl). Having a such sparse representation makes me wonder if CoDa can still be a suitable choice in my case, considering that I could not apply the log transformations in a straightforward manner. What I would like to achieve, is a dimensionality-reducted representation of my compounds, which still retains a large portion of information.
Many thanks,
James