logistic regression response variable ~ Bernoulli(pi)?

Question

I am a newbie to logistic regression. So far, I know how to derive the coefficients of logistic regression, how it basically works. What I don't know are the assumptions & inference stuff for logistic regression. Let's say binary logistic regression.

One thing that confuses me is that Logistic regression assumes the response is conditionally Bernoulli distributed given the values of the features. I saw it from this link, but I don't know why. I also saw it on wikipedia. Is there any proof of this?

Another thing is about inference. I know that for linear regression, we must have some assumptions, e.g. normality, non-colinearity, constant variance of error...etc for the inference (e.g. confidence interval of the coefficients) What about binary logistic regression? Do we have to care about the error term when doing inference on coefficients?

What do you mean by a proof of an assumption? – Peter Flom Jul 25 '19 at 10:48 — Peter Flom, Jul 25 '19 at 10:48

score 0 · Answer 1 · answered Jul 25 '19 at 09:51

Oh that link is a classic example of someone over-complicating statistics when it doesn't need to be!

Logistic regression is used in classification problems, i.e. predicting what category an observation falls into. So for example, predict the weather tomorrow when the options are "hot/cold/rain". In a binary case, this reduces to a two-class classification problem and the example before becomes, predict the weather when the options are "hot/cold". The result you predict is what's known as the 'response', this is compared to the true result, the 'truth'.

Now a slight divergence, a random variable X follows a Bernoulli distribution if the following is true:

P(X = 1) = p

P(X = 0) = 1 - P(X= 1) = 1 - p

Hopefully you will have seen before that in categorial predictions, the assignment of labels is arbitrary, so you could predict the weather tomorrow as 'hot' or 'cold' but equivalently you could predict the weather as '1' or '0', as long as you know that '1' corresponds to 'hot' and '0' corresponds to 'cold'.

Logistic regression assumes the response is conditionally Bernoulli distributed given the values of the features

This says that the prediction you make follows a Bernoulli distribution, which means that you only need to predict P(Weather = "hot") or P(Weather = "cold") but not both because P(Weather = "hot") = 1 - P(Weather = "cold"). And the statement about conditionality just means that the logistic regression model is trained on a matrix of features to make its prediction (which is the exact same as linear regression).

What about binary logistic regression? Do we have to care about the error term when doing inference on coefficients?

It requires less assumptions than linear regression but still a few. It does not require linearity, normally distributed errors or homoscedasticity. It does require independent observations (which is almost always an assumption), preferably independent features (or with little collinearity, again a common assumption).

score 0 · Answer 2 · answered Jul 25 '19 at 10:52

Regarding the second leg of your question, there is a plethora of excellent answers here on CV that deal both with the error part of the model, as well as how inference is performed and what its interpretations are.

As for the first part of the question, this explanation on Wikipedia I believe covers it very well:

Conditioned on the explanatory variables, [the outcome $Y_i$] follows a Bernoulli distribution with parameters $p_i$, the probability of the outcome of 1 for trial i. As noted above, each separate trial has its own probability of success, just as each trial has its own explanatory variables. The probability of success $p_i$ is not observed, only the outcome of an individual Bernoulli trial using that probability.

I suspect your confusion on the conditional distribution of the response (when you talk about whether there is proof of this) comes from trying to join the concepts of model assumptions and how valid they actually are. In this case, the model simply states that each outcome has its own Bernoulli distribution, with its own success probability $p_i$, but since you cannot observe that, you estimate the model on the overall success probability of the sample, which is just the proportion of positive outcomes in your data, i.e. $\sum_{i=1}^n Y_i/n$. In other words, this is -- at least in my view -- less of an assumption and more like a natural way to model (i.e. describe) the process at hand producing your outcomes $Y_i$.

logistic regression response variable ~ Bernoulli(pi)?

2 Answers2