2

Ridge Regression objective$$\underset{\beta}{\text{min}} \sum_{i=1}^n (y_i - \beta \cdot x_i)^2 + \lambda \|\beta\|_2^2$$

SVM primal problem:

$$\begin{align} \max_{\mathbf{\alpha}} \quad &\min_{\mathbf{w},b} \frac{\|\mathbf{w}\|}{2}+ C \sum_{i=1}^{N} \alpha^{(i)} \left(1-\mathbf{w^T}\phi \left(\mathbf{x}^{(i)}\right)+b)\right), \\ s.t. \quad&0 \leq \alpha^{(i)} \leq C, &\forall i \in \{1,\dots,N\} \end{align}$$

Why does SVM have a hyperparameter parameter C on the hinge-loss function while in Ridge Regression There's No C parameter infront of the quadratic loss function?

Likewise, why is there no $\lambda$ parameter behind $||w||$ in SVM so $\lambda$ is assumed 1/2 in SVM?

Why isn't ridge: $$\underset{\beta}{\text{min}} \ C\sum_{i=1}^n (y_i - \beta \cdot x_i)^2 + \lambda \|\beta\|_2^2$$

Why isn't the SVM:

$$\begin{align} \max_{\mathbf{\alpha}} \quad &\min_{\mathbf{w},b} \lambda\|\mathbf{w}\|+ C \sum_{i=1}^{N} \alpha^{(i)} \left(1-\mathbf{w^T}\phi \left(\mathbf{x}^{(i)}\right)+b)\right), \\ s.t. \quad&0 \leq \alpha^{(i)} \leq C, &\forall i \in \{1,\dots,N\} \end{align}$$?

Germania
  • 224
  • 10

1 Answers1

6

To introduce both hyperparameters in one equation would be redundant.

If you have both $C$ and $\lambda$, then there are several equivalent optimization problems with the same ratio of $C$ to $\lambda$. For instance, in your proposed ridge regression equation, the solution for $C=1, \lambda=1$ is necessarily the same as for $C=1000, \lambda=1000$. (This is because you can just factor out the common factor.)

\begin{align} &\min_{x,y} 1x + 1 y \\ =&\min_{x,y} 1000x + 1000 y \end{align}

In other words, it looks like you have two hyperparameters ($C$ and $\lambda$), but you truly have only one: the ratio between $C$ and $\lambda$.

By clamping one of the two values, you avoid a bit of wasteful overparameterization. If $C$ is always $1$, then adjusting $\lambda$ is the only way to adjust the ratio. Same for keeping $\lambda$ always at $1$.

Arya McCarthy
  • 6,390
  • 1
  • 16
  • 47
  • How is Elastic Net/Generalizations of the Lasso with two+ hyperparameters in one equation non-redundant? – Germania May 14 '21 at 04:13
  • 1
    Lasso has two summands and one hyperparameter. ElasticNet has three summands and two hyperparameters. In general, you need $n-1$ hyperparameters for $n$ summands. The remaining one is clamped to 1. – Arya McCarthy May 14 '21 at 04:20
  • There's no redundancy: it's perfectly possible to introduce multiple parameters into Ridge Regression. See my remarks in the last paragraph at https://stats.stackexchange.com/a/164546/919. – whuber May 14 '21 at 12:08
  • 1
    @whuber I think we’re talking past each other. There are perfectly valid opportunities to introduce multiple parameters. In this particular introduction, you gain no expressive power in your model. – Arya McCarthy May 14 '21 at 12:43