I'm aware that we often transform data to make it easier to analyze. For instance, we will sometimes transform data by applying the logarithm function. In the case of the logarithm function, I'm not sure exactly why this is done in general, but I know that, in the case of financial equities data, this makes the data "more normally distributed", which then makes it easier to analyze by applying theorems/techniques that assume normally distributed data.
I've been wondering two closely related questions: (1) Why this is a valid thing to do; (2) How do we know, in general, that any given transformation is valid to apply? In pondering these two questions, I recall a number of things that might be relevant:
I recall that, in probability theory, we can transform random variables (see chapter 8 Transformations of Introduction to Probability, second edition, by Blitzstein and Hwang). But do these transformation preserve the fundamental structure of the data? Mathematically, I'm not completely sure what the condition is to definitively say that the transformation preserves the fundamental structure of the data, but I think it is something along the lines of the function being one-to-one or the function being linear (rather than nonlinear)?
Related to point 1., I am reminded of exercise 1.4 from Pattern Recognition and Machine Learning by Christopher Bishop (see my questions here and here).
Having thought about (1), I suspect that it has to do with whether or not the transformation is changing the fundamental structure of the data. Fundamentally, our goal in analyzing data is to elucidate and analyze patterns in the data. But when we transform the data, how do we know that this operation preserves the fundamental structure/information of the data, so as to not change/destroy the signal/information/patterns that are embedded within it? I suspect that, when we apply a valid transformation to the data, we are, in some sense, only altering the "superficial" structure of the data, and not the "fundamental" structure.
Having thought about (2), I suspect that it has to do with, as I described above, whether the transformation is one-to-one/linear/nonlinear/etc.
As you can see, my thoughts on this matter are quite vague and uncertain. I would greatly appreciate it if people would please take the time to provide a more definitive and clear explanation on this, and the relevant mathematics (particular with regards to (2)).
EDIT:
With regards to monotonicity, my understanding is that, although monotonic functions are one-to-one, they are not necessarily linear. And if you look at my links to my math.stackexchange questions for exercise 1.4 of Bishop, you'll see that non-linear transformations of data lead to certain problems (that is, some information is lost).