2

I'd like to predict a variable that is bound between 0 and 1, these are patients' responses on a visual analog scale. When I use a simple linear model, some predictions are out of the bounds of the allowed range of values; I'd like to avoid that. It occurred to me to fit a Gaussian glm with a logit link. however, I ran into trouble since logit(0) = -Inf and logit(1) = Inf. I can simply set 0 = 0.001 and 1 = 0.999, model runs fine.

My questions are:

  1. Is the glm(..., family = gaussian(link = "logit")) appropriate for that kind of data?
  2. Is there another, more appropriate way to circumvent the Inf/-Inf problems?
  3. How could I calculate prediction intervals from that model?
Firebug
  • 15,262
  • 5
  • 60
  • 127
hanshansen
  • 145
  • 4

1 Answers1

1

Look into fractional logistic regression, which I suspect is what you're trying to get at intuitively. You already realize that linear regression can give you predictions that are out of bounds. You might or might not have realized that linear regression is also inefficient, because you can't possibly have constant variance of the error term.

For example,

How to do logistic regression in R when outcome is fractional (a ratio of two counts)?

Beta regression is another possibility. It's always possible that neither is appropriate, because you haven't given much information about the nature of the dependent variable (maybe ordered logit is more appropriate?), so you will have to judge for yourself.

The Laconic
  • 1,454
  • 2
  • 10
  • 18
  • The dependent variable is simply an overall judgement about respondents health, it was collected with a slider ranging between 0 and 100 and stepwidth 1. Most people have something between 70 - 90, so the responses are not really normally distributed. For a ordered logit i feel there are too many possible responses. – hanshansen Oct 31 '16 at 06:30
  • Agreed, if you can really take the precision of those measures seriously. If this was a question "how's your overall health on a numeric scale?", then choosing a step size of 1 on a scale of 1-100 is meaningless precision. You might as well bucket into ranges 1-10, 11-20, etc., for example. – The Laconic Oct 31 '16 at 12:11
  • But again, fractional logit and beta regression are alternatives. For fractional logit, a good short reference is Christopher Baum's "Modeling proportions" in the Stata Journal. Yes, examples are in Stata, but it's a good explanation of the idea anyway. The original paper on the method is Papke, L. E., and J. M. Wooldridge. 1996. Econometric methods for fractional response variables with an application to 401(K) plan participation rates. Journal of Applied Econometrics 11: 619–632. – The Laconic Oct 31 '16 at 12:15