Confusion surrounding differentiation of parameter vector ridge regression

Question

Let $\mathbf{\beta}$ be the parameter vector of a ridge regression.

Now we can say that:

\begin{equation} \frac{\partial \lambda \beta^T \beta}{\partial \beta}=2\lambda\beta. \end{equation}

Why is this?

I thought that $$\frac{d}{dx} x^t x = 2x^t$$

Which would imply that:

\begin{equation} \frac{\partial \lambda \beta^T \beta}{\partial \beta}=2\lambda\beta^{T}. \end{equation}

See https://stats.stackexchange.com/questions/257579 and, also, search our site for [matrix cookbook](https://stats.stackexchange.com/search?q=matrix+cookbook). The thread at https://stats.stackexchange.com/questions/234024/question-with-matrix-derivative-why-do-i-have-to-transpose/234071#234071 looks particularly useful concerning the transpose. — whuber, Nov 07 '21 at 13:48

score 2 · Accepted Answer · answered Nov 07 '21 at 13:05

2

Both are correct based on which layout notation (i.e. numerator or denominator) is being used. Unfortunately, this is not explicitly stated in most sources and you need to infer it.

answered Nov 07 '21 at 13:05

gunes

49,700
3
39
75

Confusion surrounding differentiation of parameter vector ridge regression

1 Answers1