In this post, there is a script to generate data for a logistic regression.
set.seed(666)
x1 = rnorm(1000,0,1) # some continuous variables
x2 = rnorm(1000,0,1)
z = 1 + 2*x1 + 3*x2 # linear combination with a bias
pr = 1/(1+exp(-z)) # pass through an inv-logit function
y = rbinom(length(x1),1,pr) # bernoulli response variable
df = data.frame(y=y,x1=x1,x2=x2)
glm( y~x1+x2,data=df,family="binomial")
From this script I want to 1. change the $x_2$ for a quadratic term in x1. In addition I want to 2. compare $x_1$ when not centred (change in mean
) and more or less variable (change in sd
). Below, I modified
set.seed(666)
x1 = rnorm(1000,0,1)
z = 1 + 2*x1 + 3*x1^2
pr = 1/(1+exp(-z))
y = rbinom(length(x1),1,pr)
df = data.frame(y=y,
x1=x1,
x2=x1^2)
glm( y~x1+x2,
data=df,
family="binomial")
Call: glm(formula = y ~ x1 + x2, family = "binomial", data = df)
Coefficients:
(Intercept) x1 x2
1.002 2.437 3.490
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 795.3
Residual Deviance: 615.9 AIC: 621.9
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
The estimates seems a little off. But they are close to the coefficients that were theoretically put in the model. But as soon that I change the mean, the algorithm does not converge:
set.seed(666)
x1 = rnorm(1000,10,1)
z = 1 + 2*x1 + 3*x1^2
pr = 1/(1+exp(-z))
y = rbinom(length(x1),1,pr)
df = data.frame(y=y,
x1=x1,
x2=x1^2)
glm( y~x1+x2,
data=df,
family="binomial")
Call: glm(formula = y ~ x1 + x2, family = "binomial", data = df)
Coefficients:
(Intercept) x1 x2
2.657e+01 -2.351e-08 1.234e-09
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 0
Residual Deviance: 5.802e-09 AIC: 6
Warning message:
glm.fit: algorithm did not converge
The only way I found to get back to the same estimates was to standardize the raw data (which makes the mean = 0, so I'm back to the first example).
set.seed(666)
x1 = rnorm(1000,10,1)
x1=scale(x1)
z = 1 + 2*x1 + 3*(x1)^2
pr = 1/(1+exp(-z))
y = rbinom(length(x1),1,pr)
df = data.frame(y=y,
x1=x1,
x2=x1^2)
glm( y~x1+x2,
data=df,
family="binomial")
Call: glm(formula = y ~ x1 + x2, family = "binomial", data = df)
Coefficients:
(Intercept) x1 x2
0.9872 2.4292 3.5237
Degrees of Freedom: 999 Total (i.e. Null); 997 Residual
Null Deviance: 787.8
Residual Deviance: 605.7 AIC: 611.7
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred
So how is it possible to get the logistic estimates = what was theoretically put in the linear combination, but with an $x_1$ that is e.g. x1 = rnorm(1000,10,2)?
In addition, is it possible to add an error term here: z = 1 + 2*x1 + 3*x2 + rnorm(length(x1),0,1)
? This would be the equivalent of saying $y = α_0 + β_1*x_1+β_2*x_1^2+ε$