2

In case of a regression problem we usually plot the features against the target variable to see the trends of the data, and analyze them.

But when we have a classification problem visualization does not seem to show trends. What kind of analysis should be done in order to understand the feature characteristics?

pritywiz
  • 215
  • 2
  • 5

1 Answers1

4

It sounds like you're asking about visualizing the relationship between the class and each individual input feature. If that's the case, you can plot the conditional distribution of each feature, given that the class takes each particular value. Say you have features $\{x_1, \dots, x_p\}, x_i \in \mathbb{R}$ and class $y \in \{1, \dots, k\}$. For a given feature $x_i$, estimate $p(x_i \mid y=j)$ for each value of $j$. To do this, you can select the data points with the corresponding class and estimate the distribution of the desired feature using a kernel density estimator or histogram. Plot the density estimates together on the same axis, each colored according to the class. The left plot below shows an example for a binary classification problem (but this works for multiclass problems too).

You can also plot the probability that the class takes a particular value, given the value of the feature, that is $p(y=j \mid x_i)$. The density estimates already give $p(x_i \mid y)$, so use Bayes' rule:

$$p(y=j \mid x_i) = \frac{p(x_i \mid y=j) p(y=j)}{\sum_{l=1}^k p(x_i \mid y=l) p(y=l)}$$

You can estimate $p(y)$ using the frequencies of classes in the dataset. The right plot below shows an example (for the same dataset as the left plot). Only one curve is necessary for binary classification problems, because $p(y=0 \mid x_i) = 1 - p(y=1 \mid x_i)$. Multiple curves can be plotted for multiclass problems. Care must be taken when interpreting this plot because it may not be reliable in low density regions (where $p(x_i \mid y)$ is low for all classes); in these cases it can amplify noise in the density estimates. You could imagine plotting bootstrap confidence intervals.

enter image description here

Of course, this only gives a view of the marginal distributions, and the usual caveats apply. A feature on its own might contain no information about the class, but be highly informative in combination with other features. You could use the same type of plot to examine pairs of features, using bivariate density estimates.

user20160
  • 29,014
  • 3
  • 60
  • 99
  • Its been four months after the answer was posted. To be honest,I didnt understand most part of it mainly due to the mathematical notation. But now going back to it I understand most of it. The actual help at that time was the graph which allowed me to actually comprehend the visualization which I had been doing even before the question was asked. But wasnt able to develop the intuition.I had used **pairplot** of seaborn. pairplot with diag_kind="kde" could be used in python jupyter – pritywiz Jun 18 '17 at 12:29