0

I have been using statistics (traditional null-hypothesis testing, publishing p-values in our reports, etc) for years, but I admittedly mostly follow 'kitchen receipts'. I have problems understanding the transformation of data. I can't find good convincing explanations in favor of its use.

In particular, I find it hard to understand that we draw conclusions from transformed data and apply to the un-transformed data. For example, supposed I have a response variable B and I'm comparing say 4 treatments (4 levels of a factor, say I, II, III and IV). I go for 1-Way ANOVA first, but my response variable B does not distribute normally. Then I transform it using the appropriated transformation.. say log(B). Then the normality is achieved and I can go ahead and do the test. Let's imagine ANOVA yields significant differences and a post-hoc test says that B from I and IV are equal between them and greater than III and II. Then we conclude that treatments I and IV produce larger B. This is very simple case of course, but what I fail to fully understand is: If the tests are done with, log(B) as response variable (not B) then why do we conclude things about B?

I get that normality may not be a great problem but that's not the point, let's assume it is mandatory, just for the sake of the argument.

I have read that many things we measure are already transformations, like the pH. Yes that is true, but then we speak about changes in pH, not in H. Which are related of course, but it is not the same. So sometimes I fear that researchers put too much faith into the "they are related" and it ends up kind of like being "just the same". Maybe "log" is a kind of familiar operation for many people, but there are other transformations which are much more obscure, like arcsin, sqrt, etc. Are we not losing a bit sight of the original data (and its meaning)?

A last argument is that apparently many people just inspect the raw data for normality, when in reality what need to be normal are the residuals not the raw data.

In conclusion, I am afraid a large proportion of people (including my past self) do not really know what they're dealing with when transforming data.

As specific question I always have is, given that I don't fully understand/trust data transformation, should I always resort to non-parametric tests before transforming? Or am I loosing too much in this transaction?

terauser
  • 133
  • 1
  • 5
  • Your point is a valid one, and there might be situations where transformations will not help. But, as long as you are only interested in testing the null of equality, then, if the $\log Y$ differs among groups, surely the $Y$ must differ among groups? Maybe https://stats.stackexchange.com/questions/18844/when-and-why-should-you-take-the-log-of-a-distribution-of-numbers answers your Q? – kjetil b halvorsen Dec 22 '20 at 00:43
  • A data "transformation" is merely a different way of recording the numbers. Thus, the question isn't whether to transform or whether a transformation can be "trusted:" rather, it concerns *what useful ways can be found to express the data for the intended analyses.* – whuber Dec 22 '20 at 14:09

0 Answers0