Penalized methods for categorical data: combining levels in a factor

Question

Penalized models can be used to estimate models where the number of parameters is equal to or even greater than the sample size. This situation can arise in log-linear models of large sparse tables of categorical or count data. In these settings, it is often also desirable or helpful to collapse tables by combining levels of a factor where those levels are not distinguishable in terms of how they interact with other factors. Two questions:

Is there a way to use penalized models such as LASSO or elastic net to test for the collapsibility of levels within each factor?
If the answer to the first question is yes, can, and should, this be set up in such a way that the collapse of levels and the estimation of model coefficients occurs in a single step?

This paper, https://doi.org/10.1177/1471082X16642560, gives a nice overview of what has been done in this area over the last decade or so. — Jorne Biccler, May 06 '17 at 20:12
Note: the penalty I discuss below is equation 3.4 in @JorneBiccler 's link. (It's interesting to see that this question has been considered before!) — user795305, May 06 '17 at 20:44
Possible duplicate of [Preprocess categorical variables with many values](https://stats.stackexchange.com/questions/227125/preprocess-categorical-variables-with-many-values) — kjetil b halvorsen, May 17 '17 at 00:02
How can we call this a duplicate to a question that preceded it? — Michael R. Chernick, May 17 '17 at 00:48

user795305 · Answer 1 · 2017-06-22T02:44:34.530

It is possible. We can use a variant of the fused lasso to accomplish this.

We can use the estimator $$\hat{\beta} = \arg\min_{\beta} \frac{-1}{n} \sum_{i=1}^n \left(y_i \beta^T x_i - e^{\beta^T x_i} \right) + \sum_{\textrm{factors g}} \lambda_g \left(\sum_{j \in g} |\beta_j| + \frac{1}{2} \sum_{j,k \in g} |\beta_j - \beta_k| \right).$$

Note that $\frac{-1}{n} \sum_{i=1}^n \left(y_i \beta^T x_i - e^{\beta^T x_i} \right)$ is the loss function for log-linear models.

This encourages the coefficients within a group to be equal. This equality of coefficients is equivalent to collapsing the $j^{th}$ and $k^{th}$ levels of the factor together. In the case of when $\hat{\beta}_j=0$, it's equivalent to collapsing the $j^{th}$ level with the reference level. The tuning parameters $\lambda_g$ can be treated as constant, but this if there's only a few factors, it could be better to treat them as separate.

The estimator is a minimizer of a convex function, so it can be computed efficiently via arbitrary solvers. It's possible that if a factor has many, many levels, these pairwise differences will get out of hands---in this case, knowing more structure about possible patterns of collapse will be necessary.

Note that this is all accomplished in one step! This is part of what makes lasso-type estimators so cool!

Another interesting approach is to use the OSCAR estimator, which is like above except the penalty $\|[-1 \, 1] \cdot [\beta_i \, \beta_j]'\|_1$ is replaced by $\|[\beta_i \, \beta_j]\|_\infty$.

Penalized methods for categorical data: combining levels in a factor

1 Answers1

Linked

Related