In the first phase of my analysis, I had transformed a skewed variable (normality violated, heterogeneity of variance violated) using a power transform (x^-2) to be able to conduct parametric tests.
Skewed variables boxplots:
Transformed variables boxplots:
In the second phase of my analysis, I intend to correlate these variables with another set of variables. (I have not gotten these variables yet, but I do predict an inverse linear relationship)
As you can see with the boxplots, the transformation of this variable actually reversed the order of the variable among the groups (i.e., instead of y increasing with x, y now decreases with x)
I am wondering if I would need to stick to the same transformed data for the second phase of analysis to avoid misleading readers, but then potentially end up with pearson correlation values that are effectively backward/very difficult to interpret?
- Alternatively, is it acceptable to report values for pearson correlations (even if they're backward), and then only comment on the strength of the relationship?
- Alternatively again, could I perform pearson's correlation on the untransformed variable, report the r value for this, and then report the significance value for the transformed data? (whilst making it clear where i'm getting data from)
The rationale for 2. is based on a reading by Akoglu (2018) that says: "For non-normal distributions (for data with extreme values, outliers), correlation coefficients should be calculated from the ranks of the data, not from their actual values. The coefficients designed for this purpose are Spearman's rho (denoted as rs) and Kendall's Tau. In fact, normality is essential for the calculation of the significance and confidence intervals, not the correlation coefficient itself.”
I hope I've made the question(s) clear, thanks in advance!