8

Suppose we have a logistic regression model:

$$\begin{align} P(y=1\vert\mathbf{x}) &= p \\ \log\left(\frac{p}{1-p}\right) &= \boldsymbol{\beta}\mathbf{x} \end{align}$$

Given a random sample $D=\{\mathbf{X},\mathbf{y}\}$ of size $N$, we can compute confidence intervals for the $\boldsymbol{\beta}$ and correspondingly prediction intervals for $p$, given a certain value $\mathbf{x}^*$ of the predictor vector. This is all very standard and detailed, for example, here.

Suppose instead that I'm interested in a prediction interval for $y$, given $\mathbf{x}^*$. Of course, it doesn't make any sense at all to compute a prediction interval for a single realization of $y$, because $y$ can only take the values 0 and 1, and no value in between. However, if we consider $m$ realizations of $y$ for the same fixed value of $\mathbf{x}^*$ , then this becomes similar (but not identical) to the question of computing a prediction interval for a binomial random variable. This is basically the same situation described by Glen_b in the comments to this answer. Does this question have an answer, apart from the trivial one "use nonparametric bootstrap"?

DeltaIV
  • 15,894
  • 4
  • 62
  • 104
  • can you compute a prediction interval for $log(p / (1-p))$ instead perhaps? – Hugh Perkins Mar 31 '18 at 01:15
  • 2
    @HughPerkins I think that the issue is how to combine the uncertainty in _p_ with the uncertainty in binomial sampling also given the uncertainty in _p_. Is there a closed-form solution? – EdM Mar 31 '18 at 02:17
  • @EdM you got my point. I wonder if there is a closed form solution or an analytical approximation. – DeltaIV Mar 31 '18 at 12:03
  • 1
    [offtopic] random idea, it occurs to me that it could be interesting to have a tag like 'open-research-opportunity' for questions like this which/if they are answered in the negative – Hugh Perkins Mar 31 '18 at 16:05

1 Answers1

4

One way this should work without bootstrapping (which in practice may be the fastest thing tho implement), would be:

  1. Assume that a normal approximation for the predicted log-odds ($x \hat{\beta}$) plus/minus its standard error works. Any logistic regression software will provide this.
  2. The percentiles of this distribution transform to probabilities via the anti-logit.
  3. One can find a (mixture of) beta distribution(s) that approximates the predictive distribution for the probability well.
  4. The predictive distribution for the outcome is then a (mixture of) beta-binomial distribution(s with the same mixing weights as used in step 3).

Alternatively, one can "just" integrate out the log-odds from the joint predictive of outcome and log-odds, but I believe that will be a complete mess with no closed form solution.

Björn
  • 21,227
  • 2
  • 26
  • 65
  • 4
    You could also just directly simulate from the asymptotic multivariate normal for $\beta-\hat{\beta}$, and then form a mixture of binomials over those values. – Glen_b Apr 01 '18 at 06:28
  • I like the overall idea, but I'm not sure about the details. For example, "find a (mixture of) beta distribution(s) that approximates the predictive distribution for the probability well", How do you this in practice? Could you add an example? Even a low-dimensional one would suffice. – DeltaIV Apr 02 '18 at 06:07
  • About your second alternative, if I understand correctly you suggest to write down the joint posterior distribution of outcomes and log-odds (which is a joint distribution between continuous and discrete variables) and integrate out log-odds. I definitely agree that that's going to be a mess. One could salvage the approach using MCMC to estimate the integral, I think. However, I would need MCMC algorithms which work for discrete-continuous variables...I'm not sure which one do. I think `rstan` (my tool of choice for MCMC) doesn't. – DeltaIV Apr 02 '18 at 06:11
  • The proposal by Glen_b is of course very easy: simulate from MVN for the betas, form log - odds, convert too probability, simulate binary outcomes. – Björn Apr 02 '18 at 06:14
  • Bjorn, the same question holds for you. How many components should I use in the mixture of binomials? How do I estimate $\alpha_i$ and $\beta_i$ for each component of the mixture? I'm not very familiar with density estimation, unfortunately. Even a reference to a useful package would be enough: I don't expect you two to lay down the theory of beta mixture estimation for me (that would be another question). – DeltaIV Apr 02 '18 at 06:14
  • @DeltaIV My suggestion has the same number of components as values you simulated from the MVN; let's say you have a single prediction to create and want to generate $N$ simulations, then you can set up $n+1$ bins to hold the distribution at any value of $\mathbf{x}$ you're predicting for. At each step you simulate your parameter difference vector $\beta-\hat{\beta}$ and then you have a conditional binomial for the prediction. You multiply that conditional pmf by 1/N and add that into the $n+1$ probabilities you're collecting. (These days this simulation would be called a parametric bootstrap.) – Glen_b Apr 02 '18 at 07:34
  • However, if you only had a single predicted value, you can of course shortcut this to essentially a numerical integration\*. For that, compute instead the distribution of $[\beta-\hat{\beta}]x$ (keeping in mind this is 'backwards', then shift it by the constant $\hat{\beta}x$), and across a fine grid of values over that distribution do a probability-weighted version of the suggestion in the previous comment. – Glen_b Apr 02 '18 at 07:48
  • @Glen_b sorry - my comments didn’t make sense. I kept mixing betas, binomials & beta-binomials in my mind. I’m on holiday now and apparently my mind is on holiday too :-) Once I’m back I’ll try to rewrite your comments as an answer and implement it into an R code - not b/c I plan to accept it, but just because this way I’ll be forced to follow the train of thoughts. Thanks to both of you – DeltaIV Apr 03 '18 at 10:15
  • 3
    I can write this up as something in the form of an answer if you prefer -- I don't mind either way. – Glen_b Apr 03 '18 at 14:58
  • 2
    @Glen_b I'd really appreciate that. – DeltaIV Apr 06 '18 at 18:25
  • 1
    @Glen_b, I would be interested in seeing that answer. – Richard Hardy Jun 24 '19 at 08:56
  • @DeltaIV, I would be interested in seeing that answer. – Richard Hardy Jun 24 '19 at 08:56