Using regularization with logistic regression

Question

I have a data set of 3000 observations with 9 variables, and I'm trying to predict whether water are safe for drinking. Regular multivariate logistic regression isn't that good at forecasting, and also none of the coefficients is significant, even if I run univariate logistic regression. This is why I thought of regularization, but I wasn't able to found an explanation of this and when it is appropriate to use. Also, if it exists, if be happy for a reference to R functions.

Is your measured response variable binary or some measure of contamination (e.g., 6 parts per million coronavirus). — Dave, Jun 27 '21 at 19:54
plenty of regularised glm out there i believe. Glmnet is quite popular and has vignette. However, do you have any expectation of what the relationship is between inputs and "safe"eg I could imagine not safe to drink is "legally" defined as chemical 1> conc1 or chemical2 > conc 2 or chemical 3 > conc3. I don't believe you can fit this in a logistic regression (without adding some nonlinearities). — seanv507, Jun 27 '21 at 21:10
Statistical significance has nothing to do with regularization and forecasting. What doesn’t work about forecasting with logistic regression for you? — Tim, Jun 27 '21 at 21:17
This is mostly an exercise at class. There are all kind of substances and measures like Chloramines and pH levels. The prediction is around 58% accuracy, which is quite poor in such cases, as it is health issues. — Ift h, Jun 28 '21 at 14:24

score 1 · Answer 1 · answered Jun 27 '21 at 21:07

1

Regularisation aims at reducing the effects of design matrix being overdetermined or underdetermined, recall solving $Ax=b$, $A \in \mathbb{R}^{m \times p}$. Regularisation is appropriate to use if $p>>m$ (underdetermined) or $p<<m$ (overdetermined). Here the case is $m>>p$, overdetermined (m=3000, p=9 in this case).

Using LASSO or elastic net regularisation are recommended instead of plain logistic regression. Without regularisation, solution may not be correct. glmnet's introduction will give a good idea how to use LASSO and elastic-net regularisations.

Using regularization with logistic regression

1 Answers1

Linked