In this IPython Notebook that I'm following, the author says that we should perform imputation based on the median values (instead of mean) because the variable is right skewed. I'm not sure I completely understand this. Could someone please explain to me why the median works better if the variable is skewed?
Asked
Active
Viewed 1,026 times
0
-
3The reasoning is incomplete, so one has to guess. You might find some possible explanations in the answers at https://stats.stackexchange.com/questions/2547. – whuber Jul 06 '17 at 17:29
-
3Both are probably a pretty bad idea for imputation. – Björn Jul 06 '17 at 20:11
-
1It really depends on what the aim is! What is the author attempting to achieve? – Glen_b Jul 07 '17 at 03:14
-
@Glen_b I recognize this data from the [Titanic ML challenge on kaggle](https://www.kaggle.com/c/titanic). It's a binary classification problem (passengers either survived or they didn't) – julian Jul 07 '17 at 05:03