How high must logistic covariates' predictive accuracy be for a reversal effect to show up?

Question

I am modeling an outcome for hospital patients, 'RA' (whether readmitted; 0=No, 1=Yes). My predictor of interest is 'HHS' (whether referred to Home Health Services such as from a visiting nurse; 0=No, 1=Yes). Those referred readmit at a 15.2% rate; others, 9.2%, but the former are needier, sicker patients. Conventional thinking is that if we controlled for severity of illness this difference would not only be washed out but would reverse itself. In other words, holding constant the severity of illness, having HHS should mean a lower RA rate.

With HHS as the sole predictor, its coefficient (B) in a logistic regression = 0.6 (N ~ 25k). B is reduced to 0.2 with a group of covariates controlled, each accounting for some aspect of severity of illness, but B doesn't fall below zero.

HHS alone explains only about 1% of the variance in RA; with the other predictors, this becomes 4%.* Perhaps this is the problem--that these covariates are not explaining enough variance to "succeed" in reversing the sign of the coefficient of interest. If this is true, is there a way to estimate how high their explained variance needs to be for such a reversal to show up?

EDIT: Alecos Papadopoulos has come up with an impressive solution that answers this question, soon to be published in The American Statistician. See https://www.tandfonline.com/doi/full/10.1080/00031305.2019.1704873

*Using either of 2 pseudo-R-squared formulas; Cox & Snell's or Menard's [-2LL0 - (-2LL1)] / [-2LL0.]

It sounds in your text that you equate "reversing the sign of the coefficient" with "suppressor effect". But actually these two are different phenomena. Suppressing can exist without sign reversal, and vice versa. — ttnphns, Oct 11 '13 at 19:51
Hi ttnphns. A reversal is one way a suppressor effect can work, would you say that's right? — rolando2, Oct 12 '13 at 12:41
Yes, I think. Adding a suppressor can reverse the sign of a coefficient. As well as not. So what is your question about - a suppressing phenomenon or a changing of a sign phenomenon? — ttnphns, Oct 12 '13 at 13:05
You might get more serious attention if you were to explain your undefined terms and acronyms, in particular "RSQ" and "B." Although many readers will make educated guesses, the more experienced of them will know that there are multiple possible correct guesses. For example, the meaning of "B" depends on how both `RA` and `HHS` are encoded, so even your statements about its sign are ambiguous. — whuber, Oct 14 '13 at 21:08
@rolando2, let me repeat it again, that coefficient's sign reversal in response to adding a new predictor does not necessarily makes that predictor a suppressor. And vice versa, adding a clear suppressor does not necessarily changes the sign. The title of your question remains ambiguous. Choose: either you ask about sign reversal or about suppressing effect. Or about when suppressing and sign reversal will coincide. — ttnphns, Oct 19 '13 at 09:20
I haven't really had a chance to get to this, although I've wanted to. It's worth noting @ttnphns' point, though. The sign reversal has to do with endogeneity, not suppression. For reference, I have discussed endogeneity [here](http://stats.stackexchange.com/q/58709//58712#58712), & suppression [here](http://stats.stackexchange.com/q/33888//34016#34016). — gung - Reinstate Monica, Oct 21 '13 at 04:45
@gung, you link to useful answers of yours. (Though I would doubt that sign reversal is always due to endogeneity.) If you like you might post a question about suppression etc and people, including yourself, might give their answers. — ttnphns, Oct 21 '13 at 06:06
I've upvoted all 3 answers to date as useful, yet none of them seems clearly, definitively to answer the question (each neglects logistic reg., or introduces the "no intercept" condition); they are at odds with one another; and the one with the most upvotes (3, besides mine) has what seems right now to be an empirical refutation, as my comment about the 5k regression shows. Thus I'm not ready to award a bounty. — rolando2, Oct 22 '13 at 11:42

Alecos Papadopoulos · Answer 1 · 2013-10-19T18:16:18.330

(This answer uses results from W.H. Greene (2003), Econometric Analysis, 5th ed. ch.21)

I will answer the following modified version, which I believe accomplishes the goals of the OP's question : "If we only estimate a logit model with one binary regressor of interest and some (dummy or continuous) control variables, can we tell whether dropping the control variables will result in a change of sign for the (coefficient of) the regressor of interest?"

Notation: Let $RA\equiv Y$ be the dependent variable, $HHS \equiv X$ the binary regressor of interest and $\mathbf Z$ a matrix of control variables. The size of the sample is $n$. Denote $n_0$ the number of zero-realizations of $X$ and $n_1$ the number of non-zero realizations, $n_0+n_1=n$. Denote $\Lambda()$ the cdf of the logistic distribution.
Let the model including the control variables (the "unrestricted" model) be

$$M_U : \begin{align} &P(Y=1\mid X,\mathbf Z)=\Lambda(X, \mathbf Z,b,\mathbf c)\\ &P(Y=0\mid X,\mathbf Z)=1-\Lambda(X, \mathbf Z,b,\mathbf c) \end{align}$$

where $b$ is the coefficient on the regressor of interest.
Let the model including only the regressor of interest (the "restricted" model) be

$$M_R : \begin{align} &P(Y=1\mid X)=\Lambda(X, \beta)\\ &P(Y=0\mid X)=1-\Lambda(X,\beta) \end{align}$$

STEP 1

Consider the unrestricted model. The first-derivative of the log-likelihood w.r.t to $b$ and the condition for a maximum is

$$\frac {\partial \ln L_U}{\partial b}= \sum_{i=1}^n\left[(y_i-\Lambda_i(x_i, \mathbf z_i,b,\mathbf c)\right]x_i=0 \Rightarrow b^*: \sum_{i=1}^ny_ix_i=\sum_{i=1}^n\Lambda_i(x_i, \mathbf z_i,b^*,\mathbf c^*)x_i \;[1]$$

The analogous relations for the restricted model is $$\frac {\partial \ln L_R}{\partial \beta}= \sum_{i=1}^n\left[(y_i-\Lambda_i(x_i,\beta)\right]x_i=0 \Rightarrow \beta^*: \sum_{i=1}^ny_ix_i=\sum_{i=1}^n\Lambda_i(x_i, \beta^*)x_i \qquad[2]$$

We have

$$\Lambda_i(X,\beta^*) = \frac {1}{1+e^{-x_i\beta^*}}$$

and since $X$ is a zero/one binary variable relation $[2]$ can be written

$$\beta^*: \sum_{i=1}^ny_ix_i=\frac {n_1}{1+e^{-\beta^*}} \qquad[2a]$$

Combining $[1]$ and $[2a]$ and using again the fact that $X$ is binary we obtain the following equality relation between the estimated coefficients of the two models:

$$\frac {n_1}{1+e^{-\beta^*}} = \sum_{i=1}^n\Lambda_i(x_i, \mathbf z_i,b^*,\mathbf c^*)x_i $$

$$\Rightarrow \frac {1}{1+e^{-\beta^*}} = \frac {1}{n_1}\sum_{x_i=1}\Lambda_i(x_i=1, \mathbf z_i,b^*,\mathbf c^*) \qquad [3]$$

$$\Rightarrow \hat P_R(Y=1\mid X=1) = \hat {\bar P_U}(Y=1\mid X=1,\mathbf Z) \qquad [3a]$$

or in words, that the estimated probability from the restricted model will equal the restricted average estimated probability from the model that includes the control variables.

STEP 2
For a sole binary regressor in a logistic regression, its marginal effect $m_R(X)$ is

$$ \hat m_R(X)= \hat P_R(Y=1\mid X=1) - \hat P_R(Y=1\mid X=0)$$

$$ \Rightarrow \hat m_R(X) = \frac {1}{1+e^{-\beta^*}} - \frac 12$$

and using $[3]$

$$ \hat m_R(X) = \frac {1}{n_1}\sum_{x_i=1}\Lambda_i(x_i=1, \mathbf z_i,b^*,\mathbf c^*) - \frac 12 \qquad [4]$$

For the unrestricted model that includes the control variables we have

$$ \hat m_U(X)= \hat P_U(Y=1\mid X=1, \bar {\mathbf z}) - \hat P_U(Y=1\mid X=0, \bar {\mathbf z})$$

$$\Rightarrow \hat m_U(X) = \frac {1}{1+e^{-b^*-\bar {\mathbf z}'\mathbf c^*}} - \frac {1}{1+e^{-\bar {\mathbf z}'\mathbf c^*}} \qquad [5]$$

where $\bar {\mathbf z}$ contains the sample means of the control variables.

It is easy to see that the marginal effect of $X$ has the same sign as its estimated coefficient. Since we have expressed the marginal effect of $X$ from both models in terms of the estimated coefficients from the unrestricted model, we can estimated only the latter, and then calculate the above two expressions ($[4]$ and $[5]$) which will tell us whether we will observe a sign reversal for the coefficient of $X$ or not, without the need to estimate the restricted model.

I'm not the swiftest; are you arguing that with large samples there is never a sound basis for expecting a sign to reverse, theory or no theory? — rolando2, Oct 16 '13 at 15:31
Yes, this is the result that emerges (for the particular model specification examined) - I don't know whether it is a known result, it was new to me -but it is correct. But you shouldn't be surprised: each specific model has a "technical identity" that creates certain rigidities that do not permit the model to accommodate all possible aspects of a real phenomenon (such as a sign reversal in your case). — Alecos Papadopoulos, Oct 16 '13 at 16:35
It sounds as if you are consigning statistical control to a much more limited role than many people believe it can play. And that as a consequence of your conclusion many authors (and contributors to this site) would have to completely revise what they've written about sign reversals in regression. — rolando2, Oct 16 '13 at 16:41
You are generalizing the result, while I don't. It is not I that I conclude - it is the mathematics of the model that lead to the conclusion, that holds for the exact specific assumptions of this particular model: a logit model. One Binary regressor of interest. No Constant Term. The case of dropping _all_ control variables at once. Perhaps if you change any one of these assumptions, the result may not hold (or perhaps it will). — Alecos Papadopoulos, Oct 16 '13 at 16:49
Thank you for your responses. I'm puzzled by all of them :-) — rolando2, Oct 16 '13 at 16:55
"Puzzled" is the best start to learn and really understand anything - at least from my personal experience! — Alecos Papadopoulos, Oct 16 '13 at 18:26
I can't find a mistake with your math. I have run a model, though, that fits all the conditions you've described (except that the control variables are *added* all at once, not dropped). This arrangement, the first I've run through the origin, is actually the only one I've used that does create a sign reversal. N is 5k. — rolando2, Oct 18 '13 at 14:18
That's interesting. Can you calculate the two empirical probabilities of eq [6] in the model with the control variables present? — Alecos Papadopoulos, Oct 18 '13 at 18:30
I wish I could follow it. With laypeople I say I'm a statistician, but with statisticians I say I'm a...researcher. — rolando2, Oct 18 '13 at 20:37

score 3 · Answer 2 · answered Oct 14 '13 at 20:20

This is for OLS regression. Consider a geometric representation of three variables -- two predictors, $X_1$ and $X_2$, and a dependent variable, $Y$. Each variable is represented by a vector from the origin. The length of the vector equals the standard deviation of the corresponding variable. The cosine of the angle between any two vectors equals the correlation of the corresponding two variables. I will take all the standard deviations to be 1.

enter image description here

The picture shows the plane determined by the $X_1$ and $X_2$ when they correlate positively with one another. $Y$ is a vector coming out of the screen; the dashed line is its projection into the predictor space and is the regression estimate of $Y$, $\hat{Y}$. The length of the dashed line equals the multiple correlation, $R$, of $Y$ with $X_1$ and $X_2$.

If the projection is in any of the colored sectors then both predictors correlate positively with $Y$. The signs of the regression coefficients $\beta_1$ and $\beta_2$ are immediately apparent visually, because $\hat{Y}$ is the vector sum of $\beta_1 X_1$ and $\beta_2 X_2$. If the projection is in the yellow sector then both $\beta_1$ and $\beta_2$ are positive, but if the projection is in either the red or the blue sector then we have what appears to be suppression; that is, the sign of one of the regression weights is opposite to the sign of the corresponding simple correlation with $Y$. In the picture, $\beta_1$ is positive and $\beta_2$ is negative.

Since the length of the projection can vary between 0 and 1 no matter where it is in the predictor space, there is no minimum $R^2$ for suppression.

To me, the relative strength of coeff.'s matters a lot, as I tried to say in a recent comment and edits. And I've just learned about Leamer, E. E., A Result on the Sign of Restricted Least Squares Estimates," Journal of Econometrics, 3 (1975), 387-390. See a brief summary at http://davegiles.blogspot.com/2013/05/when-can-regression-coefficients-change.html. Apparently in OLS there is a minimum predictive power required of 1 variable (relative to that of another) in order for its inclusion to cause a sign change for the other. I'd like to know the rule for groups of covariates, in logit. — rolando2, Oct 14 '13 at 20:55
@Ray, your picture is a viable explanation of the sign of a coefficient; it is like picture [here](http://stats.stackexchange.com/a/70910/3277), only 2D. But I don't see how it can explain _suppression_. To show suppression you must show error term because suppression is defined wrt it. — ttnphns, Oct 19 '13 at 08:50
@ttnphns My definition of suppression needs only the betas. However interesting it would be to discuss what definitions of suppression might be most useful for what purposes in what situations, it would also be against Stack Exchange policy, so I guess we're going to have to agree to disagree here. — Ray Koopman, Oct 19 '13 at 21:31

Erik · Answer 3 · 2013-10-14T14:17:13.783

2

There is no obvious relationship between $R^2$ and reversal of the sign of a regression coefficient. Assume you have data for which the true model is for example $$ y_i = 0+5x_i -z_z + \epsilon_i $$ with $\epsilon_i \sim N(0, sd_\text{error}^2)$. I show the zero to make explicit that the intercept of the true model is zero, this is just a simplification.

When x and y are highly correlated and centered about zero then the coefficient of z when regressing over just z will be positive instead of negative. Note that the true model coefficients do not change with $sd_\text{error}$ but you can make $R^2$ vary between zero and one by changing the magnitude of the residual error. Look for example at the following R-code:

require(MASS)
sd.error <- 1
x.and.z <- mvrnorm(1000, c(0,0) , matrix(c(1, 0.9,0.9,1),nrow=2)) # set correlation to 0.9
x <- x.and.z[, 1]
z <- x.and.z[, 2]
y <- 5*x - z + rnorm(1000, 0, sd.error) # true model
modell1 <- lm(y~x+z)
modell2 <- lm(y~z)
print(summary(modell1)) # coefficient of z should be negative
print(summary(modell2)) # coefficient of z should be positive

and play a bit with sd.error. Look for example at $sd_\text{error}=50$.

Note that with a very large sd.error the coefficient estimation will become more unstable and the reversal might not show up every time. But that's a limitation of the sample size.

A short summary would be that the variance of the error does not affect the expectations and thus reversal. Therefore neither does $R^2$.

edited Oct 14 '13 at 14:17

answered Oct 14 '13 at 13:45

Erik

6,909
20
48

I'll study this. Did you intend 'x.and.z' instead of 'x.and.y' in lines 4 and 5 of your R code? – rolando2 Oct 14 '13 at 14:02
Yes, thanks. At first I always mistakenly used x.and.y then I noticed the mistake and fixed it in just the first line; the R code continued working for me since I had not cleared my workspace. Fixed now. – Erik Oct 14 '13 at 14:18
I’d like others' help in assessing whether your example constitutes a sound litmus test. But I‘m seeing how your example supports your conclusion. I’ve tried out your simulation about 30 times, using a variety of sd.error values from 10 to 50. The sd.error and the Z coefficient are correlated at 0.18 with p = .3. – rolando2 Oct 14 '13 at 14:57
Still, doesn't it trouble you to think that nearly ineffectual control for covariates would have the same expected effect on a focal coefficient as very thorough control would? – rolando2 Oct 14 '13 at 15:00
I think a stumbling point is that changing your sd.error exerts an equal effect on both X's and Z's squared correlation with Y. I've edited my question very slightly to reflect my interest in what happens when X's connection to Y (and not Z's) gets stronger. – rolando2 Oct 14 '13 at 15:22
@rolando2 This is a reply to your comment on ineffectual control: your question still needs a bit more refinement regarding your definition of reversal. Are you interested in a reversal of the expectations of the coefficients? The answer above and the one below adress that. Or do you want to consider the power to statistically *detect* reversal (e.g. reject the null hypothesis that the coefficients of both the full and reduced model have the same sign)? The later depends on the residual variance and the magnitude of the expectations of the coefficient (thus indirectly also on R squared). – Erik Oct 15 '13 at 06:55
Thanks @Erik! Of your 2, it's close to the former. I'm asking about "what conditions are necessary in order to obtain the theoretically expected reversal of a coefficient." – rolando2 Oct 15 '13 at 12:09

How high must logistic covariates' predictive accuracy be for a reversal effect to show up?

3 Answers3