When I am given a variable, I usually decide whether to take its logarithm based on gut feeling. Usually I base it on its distribution - if it has long tail (like: salaries, GDP, ...) I use logarithms.
However, when I need to preprocess a large number of variables, I use ad hoc techniques. With some tweaking I can arrive at "desired" results, but without a good argumentation.
Is there a common or widely accepted way to decide whether to scale a (single) variable with log (or, say, square root)?
Of course, for more refined techniques I need scaling related used method, the meaning of particular parameters or their relations. But e.g. for deciding whether to use log scale in a plot - distribution of a single variable should suffice.
Requirements:
- It should be relatively method-agnostic (I can do further rescaling, if needed).
- It should based only on the distribution of values (not e.g. semantics of data).
- It should be a sensible rule for choosing scales in plots.
I know:
- In linear regression, when is it appropriate to use the log of an independent variable instead of the actual values? - Stats.SE
- When are Log scales appropriate? - Stats.SE
- Box-Cox and making variance uniform (but it is for 2+ variables)