Intuitive understanding and practice of Group Lasso

Question

I have been searching for a straight answer to clarify myself about the use of dummy variables in Lasso regression. I understand why we need to group them but I could not find any clear information on how we actually group them. My understanding is that (let's assume my categorical variable is crash type that has 4 sub levels

Crash Type 1	Crash Type 2	Crash Type 3
1	0	0
0	1	0
1	0	0

I group 2-3, and create a single variable then pick only this (Crash Type2-3, do not include crash type since it is my reference point) in my Lasso

Crash Type 1	Crash Type 2-3
1	0
0	1
1	0

Do I understand this correctly? Is it what we mean by group lasso

Thanks

See https://web.stanford.edu/~hastie/StatLearnSparsity/ for a definition of the group lasso (4.3 The Group Lasso page 58) — Adrian, Mar 04 '21 at 05:32

Scortchi - Reinstate Monica · Answer 1 · 2021-03-05T10:59:15.597

For Group LASSO you just use any coding scheme that gives a sub-matrix of full rank for each group; e.g. reference-level coding for a categorical predictor, in this case giving columns of indicator variables for 'Crash Type 2' & 'Crash Type 3'. The L1-norm penalty is applied to the groupwise L2-norms of the coefficients, ensuring that either all coefficients in a group are shrunk to zero or none of them. And usually you orthonormalize each sub-matrix, ensuring that equivalent coding schemes result in equivalent models (e.g. it would make no odds if you picked 'Crash Type 3' as the reference level). See Why use group lasso instead of lasso?.

Even if you're using the ordinary LASSO, you still don't need to merge levels of categorical variables if you don't mind that some coefficients might be shrunk to zero & others not. (Some people overparametrize by using 'one-hot' encoding—including an indicator variable for every level— to avoid the model's depending on an arbitrary choice of reference level, though that doesn't apply to other situations where a predictor is represented by multiple columns in the design matrix.) An arbitrary merging of levels doesn't make much sense, & destroys potentially predictive information. If sparsity of level coefficients within a categorical variable is in fact among your goals, there are principled ways of merging levels—you might be interested in some version of the Fused Lasso (see Penalized methods for categorical data: combining levels in a factor).

Intuitive understanding and practice of Group Lasso

1 Answers1