Imputation by median vs. mean

Asked Jul 06 '17 at 17:19

Active Aug 02 '17 at 13:05

Viewed 1,026 times

In this IPython Notebook that I'm following, the author says that we should perform imputation based on the median values (instead of mean) because the variable is right skewed. I'm not sure I completely understand this. Could someone please explain to me why the median works better if the variable is skewed?

edited Aug 02 '17 at 13:05

kjetil b halvorsen

63,378
26
142
467

asked Jul 06 '17 at 17:19

bugsyb

3

The reasoning is incomplete, so one has to guess. You might find some possible explanations in the answers at https://stats.stackexchange.com/questions/2547. – whuber Jul 06 '17 at 17:29
3

Both are probably a pretty bad idea for imputation. – Björn Jul 06 '17 at 20:11
1

It really depends on what the aim is! What is the author attempting to achieve? – Glen_b Jul 07 '17 at 03:14
@Glen_b I recognize this data from the [Titanic ML challenge on kaggle](https://www.kaggle.com/c/titanic). It's a binary classification problem (passengers either survived or they didn't) – julian Jul 07 '17 at 05:03

Imputation by median vs. mean

0 Answers0