So I've gone through this CV post, and in my primitive understanding I assume we do log transformation when we 'care' about relative changes and also to even out the positive skweness from our data.
So for example let's imagine having a dataset as follows:
Quantity Area TotalSF Year Price
7 1710 856 2003 321510
6 1262 1262 1976 183190
7 1786 920 2001 228410
7 1717 756 1915 171230
8 2198 1145 2000 201000
After log transforming 'Area', 'Price' and 'TotalSF' the respective plots shows nice normal distribution. The log transformed data looks something like this:
Quantity Area TotalSF Year Price
7 7.444249 6.752270 2003 18.34
6 7.140453 7.140453 1976 17.75
7 7.487734 6.824374 2001 17.92
7 7.448334 6.628041 1915 17.43
8 7.695303 7.043160 2000 18.12
My questions are:
- Are we going to train our model on this log transformed data ?
- If yes, then do we need to log transform our test data as well ?
- How do we get back the normal/actual values, say for the variable 'Price', after log transforming it ?
Edit:
This question specifically asks for suggestions whether to train a model based on log transformed data or not, and how to get the actual values back - more of a beginner-friendly question. The question that's used to flag this question as a duplicate on the other hand asks whether it is valid to back transform or not.
That's a completely different premise, I believe.