I'm testing if there is a relation between top important features and correlation between those features and target.
I'm working on the titanic dataset.
I plot the feature importance (using xgboost
):
- I checked if there is a relation (correlation) between the top 2 important features (
Fare
,Age
) and target (Survived
). - Moreover I checked the least important feature (
sex
) and target (Survived
). - I used 3 different types of correlation methods.
Results:
Type: pearson, fare cor: 0.2573065223849625
Type: pearson, Age cor: -0.06980851528714314
Type: pearson, Sex cor: -0.5433513806577555
Type: spearman, fare cor: 0.32373613944480834
Type: spearman, Age cor: -0.03910946205127973
Type: spearman, Sex cor: -0.5433513806577551
Type: kendall, fare cor: 0.2662286416742869
Type: kendall, Age cor: -0.03268974393136027
Type: kendall, Sex cor: -0.5433513806577552
As the data shows, it seems that there is no relation at all between important or less important features and the target.
- Am I right ?
- If so, when it will be good idea to use correlation ? (Because we can see in this example that correlated or uncorrelated features doesn't affect the target results)