0

In this IPython Notebook that I'm following, the author says that we should perform imputation based on the median values (instead of mean) because the variable is right skewed. I'm not sure I completely understand this. Could someone please explain to me why the median works better if the variable is skewed?

enter image description here

kjetil b halvorsen
  • 63,378
  • 26
  • 142
  • 467
bugsyb
  • 491
  • 1
  • 5
  • 13
  • 3
    The reasoning is incomplete, so one has to guess. You might find some possible explanations in the answers at https://stats.stackexchange.com/questions/2547. – whuber Jul 06 '17 at 17:29
  • 3
    Both are probably a pretty bad idea for imputation. – Björn Jul 06 '17 at 20:11
  • 1
    It really depends on what the aim is! What is the author attempting to achieve? – Glen_b Jul 07 '17 at 03:14
  • @Glen_b I recognize this data from the [Titanic ML challenge on kaggle](https://www.kaggle.com/c/titanic). It's a binary classification problem (passengers either survived or they didn't) – julian Jul 07 '17 at 05:03

0 Answers0