Is GLM with one continuous variable of 4 levels a nested model of GLM with 3 dummy variables?

Question

Y is just a binary variable 0 and 1.

X is a variable with 4 levels 0, 1, 2, 3.

We fit a logistic model A regarding X as continuous variable. Then we fit a logistic model B regarding X as categorical variable, so we have 3 dummy variables.

We want to know if B improves the model fit significantly, so we use likelihood ratio test.

However, I assume one model has to be nested of the other in order to use likelihood ratio test.

So is A a nested model of B? It seems to me that A's term is not a subset of B's terms.

Therefore, I want to know if following situation are also nested models?

Model A: $y = a + b +c$, Model B: $y = ab$
Model A: $y = a^2 + a^3$, Model B: $y = a$

kjetil b halvorsen · Accepted Answer · 2020-02-19T19:23:16.107

To see if model A is nested in model B, it is not enough to compare the symbolic model structure, but see What is a "symbolically nested" model?. What matters is that, for every set of values of the parameters in A, we can find parameters for B that gives the same predicted values. And that is clearly the case for your first example, so A is nested in B, although not symbolically nested.

For the two additional examples:

neither model is nested in the other
neither model is nested in the other

EDIT to clarify:

Let A be given by the model function $f_A(y;x, \theta_A)$ and B by the model function $f_B(y;x, \theta_B)$. (A model function for a random variable $Y$ means a density/probability mass function for $Y$, parametrized by some parameter varying over some parameter space, left implicit above.) Then A is nested in B if any predictions given by A can be matched by B, that is, if given $y,x$ and some $\theta_A$ there is some $\theta_B$ such that $f_B(y;x, \theta_B)=f_A(y;x,\theta_A)$.

Applying this to your question: The only difference between the glm's is in the linear predictor (so we assume the same model form, the same link function, ...). The linear predictor for A is $\eta_A(x)= \alpha_0 + \alpha_1 x$, for model B is $\eta_B(x)= \beta_0+ \beta_{11}I(x=1)+\beta_{12}I(x=2)+\beta_{13}I(x=3)$ (we have used $x=0$ as reference level, this choice does not matter.)

Given some value for $\eta_A$, say $\eta_A(x)=1+0.5 \cdot x$, finding a match for $\eta_B$ is only a matter of finding one solution (don't matter if there are more, we just need one) of the following equation system: \begin{align} 1+0.5 \cdot 0 &= \beta_0 \\ 1+0.5 \cdot 1 &= \beta_0 + \beta_{11} \\ 1+0.5 \cdot 2 &= \beta_0 + \beta_{12} \\ 1+0.5 \cdot 3 &= \beta_0 + \beta_{13} \end{align} as this is a linear system in four unknowns and four equations, it does have a solution.

I'm sorry I still don't get "gives the same predicted values"? They are two different models and have different numbers of parameters/coefficients. How are their predicted values the same? Even for the "symbolical nested model", they should give different predicted values when we fit the same data. — rmarkdown, Feb 19 '20 at 18:15
See https://stats.stackexchange.com/questions/4717/what-is-the-difference-between-a-nested-and-a-non-nested-model — kjetil b halvorsen, Feb 19 '20 at 19:06
According to your definition, why are not the two additional example nested models? — rmarkdown, Feb 20 '20 at 00:46

Is GLM with one continuous variable of 4 levels a nested model of GLM with 3 dummy variables?

1 Answers1