Logistic model variable importance on a per-row basis

Question

Overall, I'd like to be able to say that, for the logistic prediction for this row, ColA was more influential in driving up the resultant probability (ie, y_hat) than ColB. (We'll use y_hat as it's usually defined for logistic.) But is this possible? Some data scientists I've talked to say yes, but I've also seen push-back.

From what I've read, it seems that GLMs make it easiest to get at a per-row variable importance (see this limited discussion on logit in particular, including push-back). But can they actually do it?

If B1 and B2 are coefficients and the cols in X represent our features, it would seem that if B1*X1 is greater than B2*X2 then B1*X1 would drive the resultant probability towards 1 more than B2*X2. Here's an example (which brings in a factor col, for a full treatment).

We create features X1 and X2, where X1 is random and X2 (I think we can agree) has a large positive impact on y:

set.seed(33)
X1 <- runif(10, 0.0, 1.0)
X2 <- c(1,0,1,0,1,0,1,0,1,0)
y <-  c(1,1,1,0,1,0,1,0,1,0)
df <- data.frame(X1,X2,y)
dforig <- df #Need a copy bc multiplying below doesn't work with factors
df$X2 <- as.factor(df$X2)

Now we create the logit model:

fit.logit = glm(
formula = y~.,
data = df,
family = binomial(link = "logit"))

                         X1          X21  
Coefficients:       -1.2353      22.0041
Wald statistic:      -0.267        0.003

Now if we multiply B1 and B2 by X1 and X2 respectively and print the results:

coefftemp <- fit.logit$coefficients
coefficients <- coefftemp[2:length(coefftemp)] # drop intercept
multiply_res <- sweep(dforig[,1:2], 2, coefficients, `*`)

          X1       X2
1  -0.55087679 22.00411
2  -0.48751729  0.00000
3  -0.59755734 22.00411
4  -1.13510089  0.00000
5  -1.04245907 22.00411
6  -0.63908954  0.00000
7  -0.53998690 22.00411
8  -0.42395777  0.00000
9  -0.01916833 22.00411
10 -0.14575621  0.00000

We see that in the rows where X2 = 1 then B2*X2 (ie the second column) is much higher than B1*X1 (ie the first column). So it would seem that we could say that for those rows that X2 would be the dominant feature driving up the resultant prediction towards 1.

If one reverses the y dependency on X2 by replacing zeros for ones in X2, then after doing the multiplication, B2*X2 has a much lower value than B1*X1 when X2 = 1, which makes sense (since X2 now pushes y_hat towards 0 when X2 = 1). Thus, for these rows, X1 is actually more "responsible" for driving y_hat towards 1. (Note that if both results are negative, then the least negative would be the feature more responsible for y_hat being as high as it is.) Because of this, it would seem that this method of per-row feature ranking still works. What am I missing?

In case it helps, the code for the latter (reversed dependency) case is below:

# Reverse y dependency on X2
set.seed(33)
X1 <- runif(10, 0.0, 1.0)
X2 <- c(0,1,0,1,0,1,0,1,0,1)
y <-  c(1,1,1,0,1,0,1,0,1,0)
df <- data.frame(X1,X2,y)
dforig <- df #Need a copy bc multiplying below doesn't work with factors
df$X2 <- as.factor(df$X2)

fit.logit = glm(
  formula = y~.,
  data = df,
  family = binomial(link = "logit"))

                         X1          X21  
Coefficients:        -1.235      -22.004
Wald statistic:      -0.267       -0.003

coefftemp <- fit.logit$coefficients
coefficients <- coefftemp[2:length(coefftemp)] # drop intercept
multiply_res <- sweep(dforig[,1:2], 2, coefficients, `*`)
multiply_res

            X1        X2
1  -0.55087679   0.00000
2  -0.48751729 -22.00411
3  -0.59755734   0.00000
4  -1.13510089 -22.00411
5  -1.04245907   0.00000
6  -0.63908954 -22.00411
7  -0.53998690   0.00000
8  -0.42395777 -22.00411
9  -0.01916833   0.00000
10 -0.14575621 -22.00411

Overall, for logistic, can we accurately say (for example) that feature A drives y_hat toward 1 more than feature B, for this individual prediction?

Thanks, all!

score 1 · Answer 1 · edited Apr 13 '17 at 12:44

EDIT: You can see a better discussion here: Understanding which features were most important for logistic regression

TL;DR: You can calculate and compare the marginal effects, but this is an interpretive process.

It sounds to me like you're asking about the marginal effect of X on Y, and there's been a lot written about the relative weighting of marginal effects. In the case of an ordinary linear model you can usually just check the size of the $\beta$ coefficient and interpret as a 1-unit change in $X$ leading to a $\beta$ change in $Y$.

Calculating these effects for a logit is more difficult, but hardly impossible. See http://www.appstate.edu/~whiteheadjc/service/logit/intro.htm for a description of how to calculate these effects.

A murkier question is how to weight the marginal effects given the different distribution types of $X$ variables. For example, in your code, X1 is distributed uniformly, but X2 is not. When interpreting marginal effects, "a one unit increase in $X$" now has different substantive meanings. Throttling up X1 from 0 to 1 means something different than for X2.

One possible solution is to standardize your $\beta$ by dividing by the standard deviations in your $X$ data, but this still hides information about the distributions of your $X$ and what kind of marginal increases are either likely or possible. (See http://gking.harvard.edu/files/mist.pdf for a deeper discussion)

Thanks, AWP! Sorry for delay--was seeing if we'd get more responses. I'm not looking for marginal effects, per se. It doesn't matter to me what the resulting logistic probability difference is in a one unit change of X1, for example. I simply wanted to check if we could say that X1B1 drove the probability up more (and thus was more pivotal) than X2B2 did. All, can we say this if we ignore categorical cols for the moment? — Levi Thatcher, Aug 02 '16 at 15:01

Logistic model variable importance on a per-row basis

1 Answers1