0

I'm new to machine learning and currently learning it, and I do not quite understand the topic of continuous data transformations for machine learning.

If I have 6 pandas columns of continuous data, for instance: [age, weight, mean_blood_glucose, std_blood_glucose, skew_blood_glucose, kurt_blood_glucose], when I apply a Box-Cox Log Power Transform to mean_blood_glucose since it's heavily skewed, would I also have to apply the same transformation to all other continuous data, or is it alright to use different transformations based on the skewness and outliers of a specific column?

Same with scaling, for instance, Normalization or using MinMax scaler, is it good to scale every feature between <0; 1>?

Also, what about the columns X_blood_glucose?.. They're all related, is there a special approach to dealing with data that are strongly mathematically related in such way?

Jack Avante
  • 101
  • 1
  • No. It it makes sense to work with log mean, then what else makes sense, possibly is the sd of that transform, and so on. Also, depending also on how they are measured skewness can be negative or zero and kurtosis can be negative or zero if you are subtracting 3, so the logarithm of skewness and kurtosis may not even be defined (usefully). An easier counter-example to the myth of transforming all variables the same way is any indicator variable with distinct values 0 and 1. Here no one-to-one transformation can possibly improve anything whatsoever and in particular log 0 is not even defined. – Nick Cox Nov 19 '20 at 10:15
  • 2
    Best to ask one question at a time. Note that the tags you have added all have numerous previous questions. – Nick Cox Nov 19 '20 at 10:16

0 Answers0