0

I'm trying to do a visualization for a Logistic Regression (LR) model for a binary classification task.

I've built an LR model to predict the gender of English text authors (male / female) using scikit-learn in Python. I saved the model, and extracted the features and the coefficients. Now given a new random text, I would like to show the decision procedure of the model (I know how to do the actual prediction by code). For example had it been a Decision Tree model, then I'd just show the decision tree. My question is, what do I need in order to simulate the decision process of LR? Let's say I have the following text:

i hang out with my wife every weekend

And my features are: hang, my wife, weekend. My model calculates TFIDF values of the features and builds a features table like so:

hang my wife weekend
0.01 0.02 0.03

And let's say the model predicts a high probability of the author being a male. Now I would like to show how it came to this decision based on some calculations. For example a (wrong but easy) way would be to say "the male-related features are more than the female-related features (and then point out the features), and thus this was classified as a male text".

So how could I show the calculations process that the model had done to come up with its decision? I'm not necessarily looking for a library to do it automatically, I only need to know the steps and then I can implement something of my own.

I hope I'm clear enough, but otherwise please let me know and I'll try to clarify it better.

Alaa M.
  • 160
  • 7
  • 1
    I think you need to read a simple introduction to [logistic regression](https://en.wikipedia.org/wiki/Logistic_regression) and the interpretation of it's coefficients. CV.SE has a number of threads on the matter: This one here: https://stats.stackexchange.com/questions/86351/ is quite good to start with. Logistic regression coefficient have additive effects on the [log odds](https://en.wikipedia.org/wiki/Logit) domain, any "visualisation" needs to start there. – usεr11852 Dec 21 '21 at 14:20

3 Answers3

0

One advantage of decision tree is that it can be used to explain a single prediction, by following the prediction process which mimics that of a human being.

Logistic regression does not have this property. When you build a logistic regression model, it is a global model that represents all your data and it cannot explain a single prediction as decision tree does.

If you want to stick with logistic regression, but have the possibility to explain a single prediction, you may consider post-hoc explainability technique, e.g. LORE(https://arxiv.org/abs/1805.10820), which builds a tree locally around the sample you want to explain. While post-hoc technique is normally intended for blackbox models, it can also provide explanation for a single prediction from your logistic regression.

user344849
  • 106
  • 5
0

Seems that you're looking for something in SHAP's fashion, have a look at these examples and keep in mind that explanatory techniques are often non-causal but just associative measures of explanation.

DaSim
  • 133
  • 4
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Dec 21 '21 at 16:42
0

As @usεr11852 suggested, I read through Logistic Regression on Wikipedia, and found the following formula:

$$P(x)=\frac{1}{1+e^{-(\beta_0 + \sum{\beta_ix_i})}}$$

$P(x) \equiv probability \space of \space positive \space sample$
$\beta_0 \equiv logistic \space regression \space interception$
$\beta_i \equiv coefficient \space i$
$x_i \equiv feature \space value$

I have the coefficients (clf.coef_), the interception (clf.intercept_), and the feature values. So this is easy to implement now.

Alaa M.
  • 160
  • 7