5

Consider a case where I have developed a predictive model using logistic regression. Now the logistic models gives a probability even when all the inputs are zero (because of the intercept). Now consider the case of predicting if a subject has disease or not. Here the probability carries a lot of significance. In this case, even when all the inputs are zero, there is a probability that the subject has disease. Is there a way we can remove this? Can we make a condition in the output that when all the inputs are zeros then the probability is zero, or can we subtract the probability with the probability when all inputs are zero?

To explain further, consider this example from Wikipedia https://en.wikipedia.org/wiki/Logistic_regression#Example:_Probability_of_passing_an_exam_versus_hours_of_study

The logistic model is

             Coefficient    Std.Error   z-value    P-value (Wald)
Intercept    −4.0777        1.7610      −2.316     0.0206
Hours         1.5046        0.6287       2.393     0.0167

And the output of the model is

Hours of study  Probability of passing exam
1               0.07
2               0.26
3               0.61
4               0.87
5               0.97

Here when the "Hours of study" is 0, the output is 0.02. Now how can we remove this bias?

prashanth
  • 3,747
  • 4
  • 21
  • 33
  • 3
    See [Deliberately fitting a model without intercept](https://stats.stackexchange.com/q/80790/17230) & [When is it ok to remove the intercept in a linear regression model?](https://stats.stackexchange.com/q/7948/17230). It's not exactly the same situation, but the same issues apply with regard to forcing a fitted line through a stipulated point outside the range of the data. (And is it really unrealistic to predict that someone who doesn't study at all for an exam has a 2% chance of passing it?) – Scortchi - Reinstate Monica Apr 19 '17 at 12:29
  • @Scortchi's comment is the answer. Two other thoughts: 1) don't be overly eager to force your model, it's tempting but often causes unintended problems, and 2) if you are making a decision, you don't act directly on the logistic regression's output: you'll have a decision rule of some sort, in the simplest case a threshold for a go/no-go decision, but you can easily say that if the probability is less than 10% you consider that it won't happen. Not to mention, is your output actually well-calibrated? (Logistic regression will tend to be but no guarantee.) – Wayne Apr 19 '17 at 12:50
  • @Wayne Thank you for the insightful comment. I will check into the calibration part. And setting a probability value threshold makes sense. But is it wrong to put that when all values are zero, the probability is zero? And the probability equation is application when one of the inputs is non-zero. – prashanth Apr 19 '17 at 13:53
  • @Scortchi yes 2% is not a significant value, but consider a case where one needs to predict the presence of disease. 2% chance of having a disease may seem significant. – prashanth Apr 19 '17 at 13:56

1 Answers1

5

If you know a priori that the probability of the event $p_i$ must be zero when a covariate $x_i$ is zero you can model this by including $\ln x_i$ instead of $x_i$ in your model. You then have $$ \operatorname{logit}p_i = \ln \frac{p_i}{1-p_i} = \beta_0 + \beta_1\ln x_i. \tag{1} $$ This implies that the relationship between the odds of the event and $x_i$ is described by the power law $$ \frac{p_i}{1-p_i} = e^{\beta_0} x_i^{\beta_1}, \tag{2} $$ such that the odds is directly proportional to $x_i$ when $\beta_1=1$ which might be sensible but this of course depends very much on the underlying mechanisms generating the data.

If the data includes observations for which both the covariate $x_i=0$ and the response $y_i=0$, special care must be taken if fitting the model using for example glm in R. Under the model specified by (2), such observations have probability $P(y_i=0)=1$ for any value of $\beta_1>0$ and thus contribute a constant equal to $\ln 1=0$ to the total log likelihood (for $\beta_1\le 0$ such observations would be impossible). Provided that $\beta_1>0$, the overall likelihood can thus be maximised by maximising the likelihood contribution from the remaining observations for which $x_i>0$, that is, by removing observations for which $x_i=0$ before fitting the model as in the following example. Hence, we do not need to work with logarithm of zero at any point.

# Simulate some data from the model
n <- 100
x <- seq(0,10,len=n)
eta <- .1+.9*log(x)
p <- 1/(1+exp(-eta)) # this gives zero for eta = -Inf
y <- rbinom(n, size=10, prob=p)

# Plot the observed data
plot(x,y/10)

# Fit the model ommiting observations for which x == 0
data <- data.frame(x,y)
model <- glm(cbind(y,10-y)~log(x), binomial, data[x>0,])

# Compute predicted probabilities using the fitted model (x==0 don't give problems here)
xx <- seq(0,10,len=100)
lines(xx,predict(model, newdata=data.frame(x=xx), type="response"))

enter image description here

Jarle Tufto
  • 7,989
  • 1
  • 20
  • 36
  • 3
    (-1) How do you find the logarithm of zero? – Scortchi - Reinstate Monica Apr 19 '17 at 12:34
  • 1
    @Scortchi I don't see that there are any issues with the model as specified by eq. (2). If you have some observations where the event is not observed for $x_i=0$, this is only possible if $\beta_1>0$ so yes, this would constrain the MLE of $\beta_1$ to be positive. Beyond this, all other information in the data about $\beta_1$ is contained in the other observations for which $x_i>0$. Hence, $\beta_1$ can be estimated using the glm represented by eq. (1) without ever needing to take the log of zero. – Jarle Tufto Apr 19 '17 at 13:37
  • Sorry! (-(-1)). I see what you mean - as long as you don't observe the event when $x_i=0$ it's contributing a factor of $0^0=1$ to the likelihood (or a term of $0 \log 0 \approx 0$ to the log-likelihood). *Can* you fit it as a generalized linear model though, or do you have to maximize the likelihood directly? – Scortchi - Reinstate Monica Apr 19 '17 at 14:37
  • 1
    @Scortchi Any observation $y_i=0, x_i=0$ would according to (2) have probability $P(y_1=0)=1$ for any $\beta_1>0$ and would contribute a constant $\log(1)=0$ to the log likelihood. The remaining log likelihood contribution can be maximised using ordinary methods (say, `glm` in R) omitting any observations of the first kind. In the exceptional event that the remaining likelihood is maximised for some value of $\beta_1 < 0$ (which would indicated that the model would be wrong), I believe the overall MLE of $\beta_1$ (given all the data) would be "some small but positive value". – Jarle Tufto Apr 19 '17 at 15:20
  • Thanks! (+1). Might be useful to include these comments in the answer. – Scortchi - Reinstate Monica Apr 19 '17 at 16:16
  • @JarleTufto But do you mean to say that taking the log of the input and then estimating the coefficinets will do the trick. But in that case, how do we take log of an input whose value is 0. And what is your say on "Can we make a condition in the output that when all the inputs are zeros then the probability is zero (p_net = p(x) for x>0 and p_net = 0 if all x=0), or can we subtract the probability with the probability when all inputs are zero (like p_net = p(x) - p(x=0)" – prashanth Apr 20 '17 at 12:28
  • I added some further explanation to the answer. With regard to your other question, it is not clear to me why you want the model to predict zero probabilities only when all covariates are zero rather than when just any one of your covariates are zero. Based on what mechanism would you expect this? – Jarle Tufto Apr 20 '17 at 13:23
  • @JarleTufto Thanks you for the explanation. Consider a case where I am predicting the liklihood of a disease with respect to relevant covariates. Now I deploy a logistic model in a system. Now I am wondering when I run the model when all covariates are zero, it still gives me a probability which tells even when all covariates are zero, there some x % chance that there is disease. Just wanted to remove this bias. – prashanth Apr 21 '17 at 06:13