Approach to maximum likelihood in logistic model

Question

My question is very easy and probably banal, but I can't understand this concept and I found nothing on internet.

Consider a logistic/logit model, for example with 3 covariates. We want to test the hypothesis that a model without a variable is preferable. We can do this test with the LRT.

My question is: when there is a better fitting, a better adaptation of the model, the log- likelihood is expected to higher or lower? and why?

For example, if the model with 3 variables is preferable to the one with only 2, if we calculate the log-likelihood of both models ( Reduced model and Complete model ) , which is expected to be higher?

Another version: https://stats.stackexchange.com/questions/167827/why-is-sum-of-squared-residuals-non-increasing-when-adding-explanatory-variable/167832#167832 — kjetil b halvorsen, Apr 29 '21 at 13:48

BigBendRegion · Answer 1 · 2021-04-30T15:21:14.443

The maximum over a restricted set is mathematically no larger than the maximum over the full set. You can view the maximized likelihood for model with fewer regressors as the maximum over a restricted set.

Specifically, if you have three regressors, the parameters are $(\beta_0, \beta_1, \beta_2, \beta_3)$, and the maximized likelihood is the maximum over all possible combinations of $(\beta_0, \beta_1, \beta_2, \beta_3)$. The restricted model having only one regressor (say $X_1$) has maximized likelihood over the same set of combinations $(\beta_0, \beta_1, \beta_2, \beta_3)$, but restricted so that $\beta_2 = \beta_3 = 0$. The maximum over the restricted set is no larger than the maximum over the unrestricted set; in most cases it is smaller.

Just because the maximized likelihood is smaller does not necessarily mean the model is worse, though. Since this occurrence is a mathematical fact, the unrestricted model will have (ordinary) higher maximized likelihood even when $\beta_2 = \beta_3 =0$ in reality. The likelihood ratio test specifically addresses this issue, providing a reasonable answer to the question as to whether the difference in maximized likelihoods is explainable by chance alone.

Even if $\beta_2 \neq 0$ or $\beta_3 \neq 0$, the model with only $X_1$ still might be better; penalized likelihood and out-of-sample predictions address this issue.

score 0 · Answer 2 · answered Apr 30 '21 at 15:34

The log likelihood of a model with more covariates will always be larger than a that of a model with fewer covariates. The reason is simple, if $\alpha \subset \beta$, then

$$\max_{\alpha}L(\alpha) \leq \max_{\beta} L(\beta).$$

However, the question is: how much gain is there in adding the covariates that are in $\beta$ but not in $\alpha$?

The answer is not straightforward, but you can use:

Akaike or Bayesian information criteria. These penalize the number of parameters.
LASSO, which is implemented in the R package glmnet.
Likelihood ratio test.
As an informal reference, you can also fit the model using glm in R, and take a look at the p-values of the additional variables.

Approach to maximum likelihood in logistic model

2 Answers2