0

I wonder about how the residuals of a logistic regression model should be distributed.

Of course, running a linear regression model and by assuming the Normal distribution assumption, the residuals you predicted from that kind of model should be distributed as a Normal distribution with mean $\mu$ $=$ $0$ and standard deviation $\sigma$ $=$ $1$;

But, what about if you run another kind of regression, with different distribution assumption as, for instance, the logistic one?

Let's suppose one runs a logistic regression model, what distribution the residuals should have?

And, moreover, what is the test I should run to check for the distribution assumption validity?

Any hint, reference or whatever will be appreciated.

Quantopik
  • 223
  • 1
  • 6
  • 21
  • There are quite a few (answered) questions on this already, including [What is the expected distribution of residuals in a generalized linear model?](http://stats.stackexchange.com/q/57044/17230), [Interpreting residual diagnostic plots for glm models?](http://stats.stackexchange.com/q/29271/17230), & [Checking residuals for normality in generalised linear models](http://stats.stackexchange.com/q/92394/17230). Please have a look & consider editing your question to focus on anything you're still unclear on. – Scortchi - Reinstate Monica Jun 08 '15 at 11:01
  • Ok @Scortchi, I will edit the question in order to make that clearer! Thanks for the advice! – Quantopik Jun 08 '15 at 11:50

1 Answers1

-1

Tweedie distributions are a family of probability distributions which include the purely continuous normal and gamma distributions, the purely discrete scaled Poisson distribution, and the class of mixed compound Poisson–gamma distributions which have positive mass at zero, but are otherwise continuous. For any random variable Y that obeys a Tweedie distribution, the variance var(Y) relates to the mean E(Y) by the power law,

$$\text{Var}(Y) = a[E(Y)]^p$$

where a is a scaling parameter and p the tail index parameter.

They include a number of distributions, each being specified by the domain of p:

  • normal distribution, p = 0,
  • Poisson distribution, p = 1,
  • compound Poisson–gamma distribution, 1 < p < 2,
  • gamma distribution, p = 2,
  • positive stable distributions, 2 < p < 3,
  • inverse Gaussian distribution, p = 3,
  • positive stable distributions, p > 3, and
  • extreme stable distributions, p = ∞

For 0 < p < 1 no Tweedie model exists.

(Wikipedia)

Tail indexes can be estimated via standard metrics such as the Hill and Pickands esimators but Xavier Gabaix's heuristic using OLS regression and log-ranks is pretty straightforward and has the advantage of not requiring numerical integration.

See http://en.wikipedia.org/wiki/Tweedie_distribution for a general overview of Tweedies and Gabaix and Igragimov, RANK−1/2: A SIMPLE WAY TO IMPROVE THE OLS ESTIMATION OF TAIL EXPONENTS, 2009

All of that said and specifically wrt the issue of logistic regression residual diagnostics, I am aware of only one paper that treats this topic in any depth. It's by Daryl Pregibon and is titled simply Logistic Regression Diagnostics. https://projecteuclid.org/euclid.aos/1176345513

Scortchi - Reinstate Monica
  • 27,560
  • 8
  • 81
  • 248
Mike Hunter
  • 9,682
  • 2
  • 20
  • 43
  • 2
    Thanks for the answer @MikeHunter, but my question was about the residuals distribution! For instance, what is the distribution of the residuals coming up the logistic distribution? Anyway, +1 because I did not know those things about distributions :) – Quantopik Jun 07 '15 at 13:23
  • @Quantopic Thanks and apologies for the lack of completeness in my answer. Here's another way of defining Tweedie's: "Tweedie distributions are a special case of exponential dispersion models, a class of models used to describe error distributions for the generalized linear model." That said, you asked for both the *poisson* and the *logistic* as examplars. You got the wider class to which they belong. I don't feel like I shortchanged you with my answer in any way, but you're the best judge of that. – Mike Hunter Jun 07 '15 at 13:33
  • Ok @MikeHunter! Thanks again for the for the link too! – Quantopik Jun 07 '15 at 15:53
  • 2
    This doesn't seem to answer the question - on residual diagnostics - at all. – Scortchi - Reinstate Monica Jun 08 '15 at 11:13
  • 1
    @scortchi With all due respect because you may be a moderator, your comment is amazingly uninformed and seems to rest on an extremely narrow, literal interpretation of the question. I stand by my response since, as the OP noted, it expanded their knowledge of how residuals work. – Mike Hunter Jun 08 '15 at 11:20
  • 1
    Did the OP really note that it expanded their knowledge of how *residuals* work? To me they seemed to be saying "This doesn't answer my question, but thanks - it's interesting anyway". Which is also my view: to expect an answer to a question entitled "Residual diagnostics after a regression model" to contain some discussion of residuals or diagnostics or regression is surely not unreasonable. I was going to suggest your expanding on some of these points or to see if we could find a better home for your answer. – Scortchi - Reinstate Monica Jun 08 '15 at 11:52
  • 1
    @Scortchi Again, I think you're a moderator, so that gives you the power to do whatever you like. That doesn't make it right, much less fair. I think your question is really a more general one having to do with the family generalized extreme value distributions, *in general*. What you're asking for is an, in effect, a dissertation-like exposition on how these things relate to the OP's question. While there are those on this site that treat every answer as a dissertation opp, I'm not one of them. That is my choice. You might spend some time getting up to speed on this class of models. – Mike Hunter Jun 08 '15 at 12:02
  • 5
    The wording "with all due respect" doesn't cancel out "amazingly uninformed". I can't see that this answers the question either. I don't think saying that is being narrow or literal at all; your answer could be a good answer to a very different question, but not this one. – Nick Cox Jun 08 '15 at 12:04
  • 4
    @MikeHunter: What I'd *like* to do is keep the site well-organized & all contributors to it happy; it remains to be seen whether I have that power. Did you know you can ask & answer your own questions? - might be a good idea in this case, & would make your answer more prominent. (The Pregibon paper is an excellent choice by the way; a little more more on that would constitute a great answer by itself. But neither the OP nor I were asking anything about generalized extreme value distributions.) – Scortchi - Reinstate Monica Jun 08 '15 at 13:05
  • 3
    Mike, the first half of this post appears to have been copied wholesale from the [Wikipedia article on Tweedie distributions](http://en.wikipedia.org/wiki/Tweedie_distribution) without clear attribution. (Merely mentioning that article later does not suffice.) As such it misleads the reader by implicitly representing other peoples' work as your own. I have therefore edited your post to make it clear which part is from this article. Please see http://stats.stackexchange.com/help/referencing for our policy about referencing material. – whuber Jun 08 '15 at 13:15
  • Mike: You may not edit this answer to undo @whuber's edit correctly characterizing the first half as a quote from Wikipedia. – Scortchi - Reinstate Monica Jun 08 '15 at 15:16
  • Whatevs. I've seriously considered redacting my answer but it appears that at least a few people found it useful. Please not that my original answer was to an original question that has since been as heavily edited as my answer has been. When weighed against the original, less precisely specified question, it addressed that question but not the new, edited question. – Mike Hunter Jun 08 '15 at 16:09
  • @Quantopic, just for the record the link to the Pregibon article was from me, not Whuber. – Mike Hunter Jun 08 '15 at 16:09
  • @whuber Thanks for the edits and correcting the attribution to Wiki – Mike Hunter Jun 08 '15 at 16:26
  • Sorry @MikeHunter! I confused the nicknames! thanks you! – Quantopik Jun 08 '15 at 16:57
  • @Quantopic Np...but do the right thing and give me a point for the correct answer – Mike Hunter Jun 08 '15 at 17:03
  • @MikeHunter, your answer has been already marked as the right answer and I gave you +1 already when you posted the first version of your answer. Rather, could you eliminate the duplicate question tag? – Quantopik Jun 08 '15 at 17:12
  • 1
    @Quantopic In that case, thank you! This has proven to be such a teeth-pulling, painful exercise in pedantry that I failed to notice. But, then, all too many of the threads on this SE site are like that. As for the dupe, that's not mine. I don't have enough points to do anything like that. – Mike Hunter Jun 08 '15 at 17:22
  • 4
    Mike, I fear you might be confusing pedantry with a desire for clear honest communication. This site is not just about answering questions: it aims to curate the Q's and the A's both. For this to be possible and useful, it is necessary that all posts have a good chance of being understood reliably by all interested readers. I encourage you to interpret the comments and mod. interventions in this thread (and any other) in that light. Otherwise they will indeed seem to be mere pedantry, you could get discouraged, and we would (regretfully) lose someone with exceptional knowledge and experience. – whuber Jun 08 '15 at 18:56
  • @whuber Thanks for the comment and, not to be a troll about it, my experience has honed me to a fine edge when it comes to dealing with the occupational hazards of a life lived among things quantitative. Like everybody else, I know what I know and, as it happens, have gone down some paths that have expanded beyond the trad texts. I take pains to try and communicate with a modicum of rigor and thoroughness, I'm just not into lemmas. The "curation" you refer to walks a fine line between helpful clarification and gratuitous, obsessive pedantry. Crowdsourced or not, that's a drawback to all SE. – Mike Hunter Jun 08 '15 at 19:21