0

I am currently solving the titanic problem in kaggle. The data of the problem consists of several features such as "sex", "class in society", etc., and you are to predict whether a person survived the titanic incident or not. According to the analysis of other people, females are more likely to survive than males. They analysed the data using graphs. Given that information, am I supposed to do anything so that my model would know that the "sex" feature is important? Or is the analysis only important for data science (i.e. for the sake of knowing which gender survived the most).

hehe
  • 211
  • 2
  • 6

1 Answers1

1

If your data "shows" that females have a greater probability of survival, then your learning algorithm should be able to use this information. Notice, that what are you saying is that you have some subjective feeling that some predictors should be used in your model and you want to include this feeling in your model. Generally, this kind of approach is possible if using Bayesian methods where your result is:

$$ \text{estimation result} \propto \text{prior knowledge} \times \text{information encountered in data} $$

However in this approach your prior knowledge should be prior to data, i.e. it does not make much sens to use the same information twice (first learn it from data and then ask your model to learn it from data considering the fact that you learned if from data before doing estimation).

So general answer is no, there is no need to both tell your algorithm what do you assume from the data and to ask it to learn the same information from the data.

Tim
  • 108,699
  • 20
  • 212
  • 390