1

I am working with a dataset that has both numerical and categorical features. I have seen this post which discusses the problem in R, and was wondering if someone could recommend the same in Scikitlearn

I am trying to find the best way to examine the correlation among all of the features in my dataset. Currently, I am only able to see the numeric features, for example, using the seaborn library,

 c = data.corr()
 sns.heatmap(c, annot=True, cmap='Greens')

Thank you in advance

Afia R. S.
  • 35
  • 4

1 Answers1

2

You can try pandas.factorize to get the numerical representation of the categorical variables. Then you can use data.corr() to get the correlation among all the features(numerical and categorical).

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.factorize.html

You might want to read this post "The search for categorical correlation by Shaked Zychlinski" on towardsdatascience blog, https://towardsdatascience.com/the-search-for-categorical-correlation-a1cf7f1888c9

I hope this helps.

Kaushal Sharma
  • 259
  • 1
  • 5