1

If you had to use linear regression for classification, how would you achieve this?

Jun Jang
  • 453
  • 2
  • 13
  • check this question. I even have the code there. https://stats.stackexchange.com/questions/326350/what-is-happening-here-when-i-use-squared-loss-in-logistic-regression-setting – Haitao Du Mar 04 '18 at 05:04

1 Answers1

1

If you had to use linear regression for classification, how would you achieve this?

See also the Binary data section of GLM on wiki particularly the Identity link section and the link to Linear probability model. The way you can make a fit with linear link function in R is

# simulate data
set.seed(10162295)
options(digits = 5, nwarnings = 5)
n <- 1000
x <- runif(n, -.5, .5)
y <- 0.5 + x > runif(n)

# fit model
fit <- glm(
  y ~ x, binomial("identity"), start = c(0.5, 0), mustart = rep(.5, n))
#R> Warning messages:
#R> 1: step size truncated: out of bounds 
#R> 2: step size truncated: out of bounds 
#R> 3: step size truncated: out of bounds 
#R> 4: step size truncated: out of bounds 
#R> 5: step size truncated: out of bounds 
summary(fit)
#R>  
#R> Call:
#R> glm(formula = y ~ x, family = binomial("identity"), start = c(0.5, 
#R>     0), mustart = rep(0.5, n))
#R> 
#R> Deviance Residuals: 
#R>    Min      1Q  Median      3Q     Max  
#R> -2.492  -0.823  -0.080   0.810   2.921  
#R> 
#R> Coefficients:
#R>             Estimate Std. Error z value Pr(>|z|)    
#R> (Intercept)  0.49887    0.00703      71   <2e-16 ***
#R> x            0.99907    0.01407      71   <2e-16 ***
#R> ---
#R> Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#R> 
#R> (Dispersion parameter for binomial family taken to be 1)
#R> 
#R>     Null deviance: 1386.04  on 999  degrees of freedom
#R> Residual deviance:  973.94  on 998  degrees of freedom
#R> AIC: 977.9
#R> 
#R> Number of Fisher Scoring iterations: 25
#R> 

You can make prediction for new observations using the predict method where the newdata argument contains the new observations.

Do also see the post that @hxd1011 links to.