0

Let $\mathbf{\beta}$ be the parameter vector of a ridge regression.

Now we can say that:

\begin{equation} \frac{\partial \lambda \beta^T \beta}{\partial \beta}=2\lambda\beta. \end{equation}

Why is this?

I thought that $$\frac{d}{dx} x^t x = 2x^t$$

Which would imply that:

\begin{equation} \frac{\partial \lambda \beta^T \beta}{\partial \beta}=2\lambda\beta^{T}. \end{equation}

Daniel De Wet
  • 105
  • 10
  • See https://stats.stackexchange.com/questions/257579 and, also, search our site for [matrix cookbook](https://stats.stackexchange.com/search?q=matrix+cookbook). The thread at https://stats.stackexchange.com/questions/234024/question-with-matrix-derivative-why-do-i-have-to-transpose/234071#234071 looks particularly useful concerning the transpose. – whuber Nov 07 '21 at 13:48

1 Answers1

2

Both are correct based on which layout notation (i.e. numerator or denominator) is being used. Unfortunately, this is not explicitly stated in most sources and you need to infer it.

gunes
  • 49,700
  • 3
  • 39
  • 75