If you had to use linear regression for classification, how would you achieve this?
Asked
Active
Viewed 1,181 times
1
-
check this question. I even have the code there. https://stats.stackexchange.com/questions/326350/what-is-happening-here-when-i-use-squared-loss-in-logistic-regression-setting – Haitao Du Mar 04 '18 at 05:04
1 Answers
1
If you had to use linear regression for classification, how would you achieve this?
See also the Binary data section of GLM on wiki particularly the Identity link section and the link to Linear probability model. The way you can make a fit with linear link function in R is
# simulate data
set.seed(10162295)
options(digits = 5, nwarnings = 5)
n <- 1000
x <- runif(n, -.5, .5)
y <- 0.5 + x > runif(n)
# fit model
fit <- glm(
y ~ x, binomial("identity"), start = c(0.5, 0), mustart = rep(.5, n))
#R> Warning messages:
#R> 1: step size truncated: out of bounds
#R> 2: step size truncated: out of bounds
#R> 3: step size truncated: out of bounds
#R> 4: step size truncated: out of bounds
#R> 5: step size truncated: out of bounds
summary(fit)
#R>
#R> Call:
#R> glm(formula = y ~ x, family = binomial("identity"), start = c(0.5,
#R> 0), mustart = rep(0.5, n))
#R>
#R> Deviance Residuals:
#R> Min 1Q Median 3Q Max
#R> -2.492 -0.823 -0.080 0.810 2.921
#R>
#R> Coefficients:
#R> Estimate Std. Error z value Pr(>|z|)
#R> (Intercept) 0.49887 0.00703 71 <2e-16 ***
#R> x 0.99907 0.01407 71 <2e-16 ***
#R> ---
#R> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#R>
#R> (Dispersion parameter for binomial family taken to be 1)
#R>
#R> Null deviance: 1386.04 on 999 degrees of freedom
#R> Residual deviance: 973.94 on 998 degrees of freedom
#R> AIC: 977.9
#R>
#R> Number of Fisher Scoring iterations: 25
#R>
You can make prediction for new observations using the predict
method where the newdata
argument contains the new observations.
Do also see the post that @hxd1011 links to.

Benjamin Christoffersen
- 2,186
- 10
- 31