5

I've read through Hazan's paper on online convex optimization. I don't quite understand why the regularization term must be strongly convex instead of more relaxed condition such as strictly convex.

https://ie.technion.ac.il/~ehazan/papers/shalom.pdf

For instance, we want to compute $$x = \text{argmin}_{x \in X} {f^Tx + R(x)}$$

Let $f \in \mathbb{R}^n, x \in \mathbb{R}^n$, so $f^Tx$ is a linear function, and $R(x)$ is the strongly convex regularization term, such as $\|x\|^2$. So $x$ will be unique, because the objective is strongly convex.

But what if $R(x)$ is simply strictly convex? It is a much more relaxed condition. The argument will be strictly convex, the minimizer again will be unique. What seem to be the issue with using strictly convex regularizers?

Fraïssé
  • 961
  • 2
  • 13
  • 29
  • It's part of the proof that the algorithm can achieve logarithmic regret, not part of the proof that the algorithm will converge. – jbowman Mar 25 '17 at 14:46

1 Answers1

4

This is actually quite an interesting question. It's best to think of the RFTL algorithm as most naturally working with quadratic norms $R(x)=||x||_A^2$ for some positive definite matrix $A$. If you look at the proof, the support for other regularisers is handled by bounding the error by such a norm, where the matrix $A$ is the hessian of $R$ at some (unknown) point $z$. In order for the hessian to the positive definite everywhere (and hence induce a well-defined norm everywhere), $R$ needs to be strongly convex. In effect, strongly convex regularisers act approximately quadratic, at least locally.

Intuitively, a strictly convex function can be extremely close to being flat in some regions of the space. If such a regulariser was used, it would have virtually no effect, so basically the algorithm would behave like the non-regularised follow-the-leader.

AaronDefazio
  • 1,551
  • 7
  • 11
  • Aha, thanks for the answer. The problem seems to be strictly convex <= positive definite matrix, but not the other way. Also, for online linear optimization i.e. $f$ is a linear function, what can we say about the convexity of $f^Tx + R(x)$, where $R(x)$ is strongly convex in some norm $\|\cdot\|_p$? For instance $R(x) = xlog(x)$ is strongly convex in the 1 norm. Is $f^Tx + R(x)$ also strongly convex in $\|\cdot\|_p$? (I believe it is the case, but nothing to back it up short of a proof) – Fraïssé Mar 29 '17 at 05:23
  • In finite dimensions, strong convexity in one norm implies strong convexity in all other norms. The constant of strong convexity depends on the norm though. Note also that the function $x\log(x)$ is only technically strongly convex on bounded subsets of $R$. To answer your question, $f^Tx + R(x)$ is strongly when $R(x)$ is. – AaronDefazio Mar 29 '17 at 12:46