How to group continuous variables in LASSO for a multinomial regression model?

Question

With the goal of selecting predictors for a 4 level outcome variable I want to apply LASSO for predictor selection. Some continuous variables are related to each-other and should all be in the final model, or none.

Context: The dataset consists of around 1700 rows of data ($n$) with around 30 predictors (a mix of continuous predictors, binary categorical and multilevel categorical ones). As stated, the outcome/dependent variable is a 4-level categorical one (not ordered). The frequency of the 4 outcome groups is around 600/250/250/600.

Now I recon some associations between continuous predictors and the outcome are not linear. But, I do not have an idea of the functional form of these possible non-linear associations. So, I wanted to use (restricted cubic) splines to allow for non-linear associations for the continuous variables.

The problem is that when creating a spline this is actually a data transformation where a continuous variable $x$ is used to create 'spline-variables' based on splineknots. Simply put, when the knots are set at 30 and 60, the set of spline variables would consist of $x$, $x'$ and $x''$, where:

$x$ is the original continuous variable;
$x'$ is $0$ when $x<30$, and a function $f(x-30)$ when $x>=30$
$x''$ is $0$ when $x<60$, and a function $f(x-60)$ when $x>=60$

My strategy now is to obtain restricted cubic spline data through a regular multinomial regression, extract the model-matrix, and feed this into a LASSO function. As stated I'd want to keep the sets of splinevariables together, or dropped completely (just like the dummies for a categorical variable would). However, I have not been able to find a proper function (in R) which does grouped LASSO for multinomial regression. Moreover, the elastic net functions I've found do not support spline fitting within the function, do not support grouping of predictor variables, and/or do not support multinomial regression models.

For example, the 'glmnet' documentation does not mention grouping of variables and the 'gglasso' and 'grplasso' are for binary outcomes only (or so they seem);

in short: is it possible to perform grouped LASSO for a multinomial regression model? Or fit splines within these functions? And if so, how? (which software - I'd prefer an R-package - allows this?)

Ps. This question is related to this one, but looking at the comments and answers there I am definitely looking for something else. The 'type.multinomial = "grouped"' option in glmnet does not keep specific variables together, but instead keeps a single variable in for all outcomes of a multinomial regression, if it is retained for any one of the outcomes (i.e. even when using this option I still see certain splinevariables dropped while related splinevariables are retained). Further, as stated, the answer provided there (using gglasso) does not apply to multinomial regression.

If you know what objective you have in mind, why do you need a package to do it? Just use an optimization library. If you want to use R, you can use the package here (https://web.stanford.edu/~boyd/papers/pdf/cvxr_paper.pdf) for optimization, for instance. — user795305, Dec 06 '17 at 16:13
@user795305 could you elaborate with an answer how to? I'm not that proficient with writing my own optimization functions — IWS, Dec 06 '17 at 16:14
I'm not sure I understand your objective. Lasso doesn't really drop any variables--it estimates coefficients. You *interpret* a zero coefficient as "dropping," but you don't have to: you may equally well think of it as "keeping" the corresponding term but applying an infinitesimal coefficient to it. What, then, is your objective in "keeping" the groups of spline terms, when there seems to be no actual functional meaning to "dropping" or "keeping" variables? — whuber, Dec 06 '17 at 16:24
@whuber , fair enough, but what then would you do with a set of splinevariables where, for example, $β_x$ and $β_{x'}$ are shrunk to zero, but $β_{x''}$ (note the double accent) is not? Keep the variable and its splines? And what if only the original variable coefficient $β_x$ is not shrunk to zero, but the others are? — IWS, Dec 06 '17 at 16:28
In one recent project of mine, whose purpose was solely for prediction, I counted and reported on the terms with nonzero coefficients. One way to interpret this is that selecting any subset of the spline terms for a variable is tantamount to selecting a particular family of nonlinear transformations of that variable. — whuber, Dec 06 '17 at 16:33
@whuber that does seem a good solution, better yet, it means I could continue with the code I have now. Does this then also apply to >2 level categorical predictors (i.e. some of the dummies being shrunk to zero, but at least one non-zero, means keeping all levels of the categorical variable)? P.s. if you'd post an answer and no others have any other insights, I'd accept it — IWS, Dec 07 '17 at 08:06
If you use even one level of a categorical variable you are keeping that variable in the model. It's analogous to the spline: if you keep even a single spline term in the model, you are using the associated variable in the model. It's simply a matter of how you will express it. In effect, the Lasso is trying to tell you how to bin and encode the levels of your categorical predictors. For instance: when using effects coding, the zero coefficients indicate those levels to combine with the reference level. (This interpretation potentially could offer some insight into the Lasso results.) — whuber, Dec 07 '17 at 15:32
@whuber: The objective might be to ensure that equivalent parametrizations (e.g. using a b-spline basis vs a truncated power function basis) result in the same fitted model. — Scortchi - Reinstate Monica, Dec 11 '18 at 14:28

How to group continuous variables in LASSO for a multinomial regression model?

0 Answers0