1

Sometimes, Adagrad is expressed like this

$\mathbf{x}^{t+1} = \mathbf{x}^t –[{η/√{G^t + ε}}]$ ⊙ ∇E

where G is a diagonal matrix. Accoding to wiki, Hadamard product is only defined when two matrxes shape are same. However, some libraries make those calculations possible by what we called broadcast. And I assume $[{η/√{G^t + ε}}]$ ⊙ ∇E is still a diagonal matrix. And after subtraction, we have the diagonal matrix. So, I don't understand when we can get a desirable column vector by this operation? I can't find any good article or discussion.

Could anyone explain?

  • 1
    I've seen a lot of literature omitting the fact that they are multiplying everything by a row or column of ones in order to turn the $n$ elements of an $n$-by-$n$ diagonal matrix into an $n$-element vector. Maybe they're doing something similar here. I think a lot of these guys think it's a convenient shortcut, but to me it makes the equations extremely difficult to follow because my first step when I try to work through a matrix equation is to figure out the dimensionality of every term and make sure it all matches up. – Josh Oct 25 '17 at 01:18

1 Answers1

1

The suggestion I made in my comment above turns out to be correct. See the top of page 2123 of the original AdaGrad paper and this helpful explanation. While the "real" $G$ has off-diagonal elements (it's the sum of the outer product of derivatives from $t=1$ to $t=T$), the actual AdaGrad implementation uses only its diagonal (hence the informal "diag" operator in the paper) to make the square root of $G$ computable in linear time (with respect to $p$, the number of weights being optimized). If you wanted to express this formally you could multiply $G$ by a column of ones; that's essentially the "diag" operator. And the $\nabla E_t$ is simply a vector of gradient values computed at time $t$. Each of these vectors is of length $p$, so it all matches up just right. To be clear, the Hadamard product is simply the element-by-element multiplication of its two operands.

Josh
  • 1,268
  • 1
  • 13
  • 17