Why is transforming data valid, and how do we know, in general, that any given transformation is valid to apply?

Question

I'm aware that we often transform data to make it easier to analyze. For instance, we will sometimes transform data by applying the logarithm function. In the case of the logarithm function, I'm not sure exactly why this is done in general, but I know that, in the case of financial equities data, this makes the data "more normally distributed", which then makes it easier to analyze by applying theorems/techniques that assume normally distributed data.

I've been wondering two closely related questions: (1) Why this is a valid thing to do; (2) How do we know, in general, that any given transformation is valid to apply? In pondering these two questions, I recall a number of things that might be relevant:

I recall that, in probability theory, we can transform random variables (see chapter 8 Transformations of Introduction to Probability, second edition, by Blitzstein and Hwang). But do these transformation preserve the fundamental structure of the data? Mathematically, I'm not completely sure what the condition is to definitively say that the transformation preserves the fundamental structure of the data, but I think it is something along the lines of the function being one-to-one or the function being linear (rather than nonlinear)?
Related to point 1., I am reminded of exercise 1.4 from Pattern Recognition and Machine Learning by Christopher Bishop (see my questions here and here).

Having thought about (1), I suspect that it has to do with whether or not the transformation is changing the fundamental structure of the data. Fundamentally, our goal in analyzing data is to elucidate and analyze patterns in the data. But when we transform the data, how do we know that this operation preserves the fundamental structure/information of the data, so as to not change/destroy the signal/information/patterns that are embedded within it? I suspect that, when we apply a valid transformation to the data, we are, in some sense, only altering the "superficial" structure of the data, and not the "fundamental" structure.

Having thought about (2), I suspect that it has to do with, as I described above, whether the transformation is one-to-one/linear/nonlinear/etc.

As you can see, my thoughts on this matter are quite vague and uncertain. I would greatly appreciate it if people would please take the time to provide a more definitive and clear explanation on this, and the relevant mathematics (particular with regards to (2)).

EDIT:

With regards to monotonicity, my understanding is that, although monotonic functions are one-to-one, they are not necessarily linear. And if you look at my links to my math.stackexchange questions for exercise 1.4 of Bishop, you'll see that non-linear transformations of data lead to certain problems (that is, some information is lost).

Some threads that might help you clarify your thoughts include [Transforming Variables](https://stats.stackexchange.com/questions/4831), [Box-Cox Transformations](https://stats.stackexchange.com/questions/35711), and [When is it appropriate to use the log?](https://stats.stackexchange.com/questions/298). — whuber, Aug 09 '20 at 14:47

score 0 · Answer 1 · answered Aug 10 '20 at 10:43

The discussion you have in your question seems reasonable. One facet that seems to be in all the transformations I know of is that they are monotonic, thus this seems an important part of the “structure” of the original data that must be kept post transform. The signal remains, it is more a different way of viewing the same thing, with some views more helpful at different times. At times interpretability may become more difficult, but that is sometimes worth it if say now the assumptions hold and thus you can remain within say a regression framework, which is sometimes helpful.

It may seem more comfortable with a change of variable between celsius and Fahrenheit, apart from any rounding error in the process it seems that no information is lost. Other transforms may seem superficially more complex than this and that they make things more difficult to interpret. However, this is not always the case for example in a simple linear regression where the dependent variable and sole regressor are logged, the interpretability of the coefficient is in terms of elasticity, which in some ways is more interpretable than in the unlogged model where we have to use a one unit change in the regressor variable interpretation, which is not always appropriate.

The same example as above, but starring from a different starting point is that say you believe a relationship is of the form $Y=cX^{\theta}$, then taking logarithms you could then fit it using simple linear regression. That the variables are now logged is another way of looking at the data, not worse, not better.

(+1) just a couple of more thoughts. First, for tree-type models that work with cut-offs in continuous predictors, a monotonic predictor transformation shouldn't affect the results. Second, transformations of predictors can change the shape of an outcome-predictor _relationship_ into non-monotonic. A linear regression using an untransformed continuous predictor will necessarily impose a monotonic relationship between predictor and outcome. If the predictor is modeled with polynomial terns, as in a spline, then the relationship between predictor and outcome can take all sorts of shapes. — EdM, Aug 10 '20 at 14:42
This doesn't really answer either of my questions. Any answer to my question should contain a clear, mathematical explanation of the specific properties necessary for a transformation to be "valid" for application to data. This seems like a lengthy comment rather than an answer. — The Pointer, Aug 10 '20 at 17:47
With regards to monotonicity, my understanding is that, although monotonic functions are one-to-one, they are *not* necessarily linear. And if you look at my links to my math.stackexchange questions for exercise 1.4 of Bishop, you'll see that non-linear transformations of data lead to certain problems (that is, some information is lost). — The Pointer, Aug 10 '20 at 18:15

Why is transforming data valid, and how do we know, in general, that any given transformation is valid to apply?

EDIT:

1 Answers1