8

I devised this toy example

library(sigmoid)
N <- 10000
age <- runif(N, min=20, max=90)
e <- rnorm(N, 0, 5)
hi <- logistic(-100+2*age+e)
hid <- ifelse(hi>=0.5, T, F)
hid <- as.factor(hid)
df <- data.frame(age=age, hid=hid)
lr <- glm(hid~age, data=df, family=binomial(link="logit"))
s <- summary(lr)
print(s)

The variable hid contains 4304 FALSE and 5696 TRUE.

I would have expected to get the correct coefficients out of the logistic regression.

Instead I am getting -39.46 for the intercept and 0.79 for the slope. Both with p-values $\approx$ 0.

What am I doing wrong?

robertspierre
  • 1,358
  • 6
  • 21

1 Answers1

15

If you're trying to generate data from logistic regression's assumed data generating mechanism, your code does not do that.

Logistic regression's data generating mechanism looks like

$$ \eta = X\beta$$ $$ p = \dfrac{1}{1+e^{-\eta}}$$ $$ y \sim \operatorname{Binomial}(p, n) $$

What it looks like you're trying to do is create a linear regression in the log odds space, error term included. That is incorrect. The error term comes from the binomial likelihood. To create data properly so that glm will estimate the parameters you've specified, you need to do

library(sigmoid)
N <- 10000
age <- runif(N, min=20, max=90)
#Changes here
p <- logistic(-100+2*age)
hid <- rbinom(N, 1, p)
# End changes
df <- data.frame(age=age, hid=hid)
lr <- glm(hid~age, data=df, family=binomial(link="logit"))
s <- summary(lr)
print(s)

```
Demetri Pananos
  • 24,380
  • 1
  • 36
  • 94
  • 2
    Thank you very much. I misunderstood the DGP of the logistic regression. Thank you! – robertspierre Apr 11 '21 at 23:20
  • Can I ask for more information? $$y \sim Binomial(p, \eta) $$ Is this saying "y is the equivalent of the joint binomial distribution of p and eta? Or is Binomial(foo, bar) here some function that takes foo and bar as arguments? – rorance_ Apr 12 '21 at 06:33
  • 2
    @rorance_ It means that `y` is distributed like a Binomial, with parameters `p` (probability of success of the single experiment) and `n` (number of experiments) – robertspierre Apr 12 '21 at 07:24