1

enter image description here

I am reading a paper where it has been mentioned that data was log-transformed to reduce skewness following which it was posterior standardized to mean 0 and standard deviation of 1 for "easy comparison" (between species). Why is this (posterior standardization) done (statistically) and how to do it on R? Kindly explain!

Here is the link to the paper - Intraspecific variation in traits and tree growth along an elevational gradient in a subtropical forest

EdM
  • 57,766
  • 7
  • 66
  • 187

1 Answers1

2

The paper in question used specific leaf area (area per gram), leaf area, leaf toughness (Newton), leaf thickness (µm), and wood density (gram per cubic centimeter) as predictors. The authors then used principal-components analysis (PCA) to combine information among those predictors into a smaller number of linearly independent predictors.

To do PCA properly, all of the original predictors need to be on comparable scales of measurement. Otherwise, you'd be facing a situation in which the results would differ if you, say, measured leaf thickness in millimeters instead of micrometers.

A standard way to put predictors on comparable scales for PCA is to transform all of them so that they have mean values of 0 and variances/standard deviations of 1. In this paper, some of the predictors were log-transformed before that step. I find "posterior standardization" to be somewhat awkward terminology, which I interpret to mean that the transformation to 0 mean and unit variance was done after the log transformations.

That transformation is easily done yourself if you can calculate means and standard deviations: for each predictor, subtract the mean, divide by the standard deviation, and you're done. Statistical software often provides a helper function; in R, the scale() function provides that capacity.

As PCA is based on variance, this type of standardization makes the most sense in this case. For some other data-analysis methods (neural nets), investigators might choose instead to transform all predictors to have minimum values of 0 and maximum values of 1. The terminology to describe these transformations is often confusing and inconsistent, so when you read words like "normalize" or "standardize" or "scale" you have to be very careful to see just what the authors meant in that particular context for the transformation they used.

EdM
  • 57,766
  • 7
  • 66
  • 187
  • Thank you for this answer! So from the terminology "posterior standardization" I have understood they intend to normalise the data. However, isn't that exactly why we are log tranforming? Kinda confused as to why we have both the measures in place. Again, Thank you so much!! – Ashish Nambiar Jun 11 '21 at 12:50
  • @AshishNambiar log transformation by itself doesn't "normalize" the data in the sense I (and the authors of that paper) use it. After "normalization" in that sense, data have a mean of 0 and a standard deviation of 1. Log-transformed data on their own don't necessarily have that type of "normalized" distribution. Log transformation does decrease the _skew_ in this type of data. So you can think of the process as: decrease skew via log transformation, then "normalize" to 0 mean and unit standard deviation. – EdM Jun 11 '21 at 13:03