0

Like let's say my predictor X has four levels, 1, 2, 3, and 4. Fitting a logistic regression will generate estimates of $\beta$ for x = 2, x = 3, and x = 4, and I understand that each of these estimates reflects the odds ratio of that level of x to the reference level of x = 1.

My question is, how do I express that model in a general form?

I don't think this is correct:

$\log\frac{\pi(x)}{1-\pi(x)}=\beta_0+\beta_1x + \beta_2x+\beta_3x$

...because this would imply that I could set different values for each x, which I cannot. I'm not allowed to do x=2 AND x=3, for example.

But then this ALSO seems incorrect:

$\log\frac{\pi(x)}{1-\pi(x)}=\beta_0+\beta x$

...because I have multiple slopes here, not just a single $\beta$.

So how do I actually write this formula?

Dan W
  • 39
  • 3
  • 2
    How would you do it in a linear regression (e.g., ANOVA)? – Dave Feb 25 '22 at 17:33
  • 1
    See https://stats.stackexchange.com/questions/133623. Perhaps that answers your question? If not, searching our site for "dummy variable coding" will turn up plenty of helpful material. – whuber Feb 25 '22 at 18:07
  • @Dave - that's the second formula, but that doesn't parallel to my situation because in my situation I generate multiple slopes, not just one. – Dan W Feb 25 '22 at 19:47
  • What is your $x$ variable in the second formula when you do an ANOVA? – Dave Feb 25 '22 at 19:50
  • @Dave x = 1, 2, 3, or 4 in my example. – Dan W Feb 25 '22 at 20:00
  • Why doesn't that work for logistic regression? – Dave Feb 25 '22 at 20:01
  • @Dave because each individual x has a unique $\beta$ in logistic regression. – Dan W Feb 25 '22 at 20:03
  • Why doesn’t that happen in linear regression? – Dave Feb 25 '22 at 20:08
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/134432/discussion-between-dan-w-and-dave). – Dan W Feb 25 '22 at 20:50

1 Answers1

0

Let's first work it out in the case of linear regression.

Let's take a predictor variable that has two levels, cat and dog, and that will be our only predictor.

$$ \mathbb E[y] = \beta_0x_{cat} + \beta_1x_{dog} $$

The $x_{animal}$ variables take the value $1$ if the subject is that animal and $0$ if the subject is not that animal.

However, it is more common to use an intercept and then a variable that compares one of the factors to the other.

$$ \mathbb E[y] = \beta_0 + \beta_1x_{dog} $$

Here, $x_{dog}$ takes the value $1$ if the subject is a dog and $0$ if the subject is not a dog. This just as easily could have been done for a cat, but the cats aren't missing from the model. The cats become a reference category with their mean given by the intercept $\beta_0$, and then $\beta_1$ is by how much the mean for dogs differs from the mean for cats.

For example, if we have cats with a mean mass of $4$ kg and dogs with a mass weight of $40$ kg, our regression equation would be $\mathbb E[y] = 4 + 36x_{dog}$.

Now let's also look at horses, which have a mean mass of $250$ kg. We expand our regression equation to have a variable $x_{horse}$ that takes $1$ if the subject is a horse and $0$ otherwise.

$$ \mathbb E[y] = 4 + 36x_{dog} + 246x_{horse} $$

If we include alligators that have a mean mass of $225$ kg, then we get:

$$ \mathbb E[y] = 4 + 36x_{dog} + 246x_{horse} + 221x_{gator} $$

If we didn't know the means for each animal and had to estimate them from the data, we might propose the following model and then estimate the coefficients using a method like ordinary least squares.

$$ \mathbb E[y] = \beta_0 + \beta_1x_{dog} + \beta_2x_{horse} + \beta_3x_{gator} $$

But you didn't ask about linear models. You asked about generalized linear models that deal with $g(\mathbb E[y])$ instead of just $\mathbb E[y]$. For a binary response variable $y$ whose log-odds of occurance depends on the species (cat, dog, horse, or alligator), we might propose the following model, for $g(p)=\log\big(\frac{p}{1-p}\big)$, $p\in(0,1)$.

$$ g(\mathbb E[y]) = \beta_0 + \beta_1x_{dog} + \beta_2x_{horse} + \beta_3x_{gator} $$

As usual, the $x_{animal}$ variables take $1$ if the subject is that animal and $0$ otherwise.

Dave
  • 28,473
  • 4
  • 52
  • 104