5

Problem statement

Use the definition of convexity of a function, i.e., that for any $\boldsymbol{x}$, $\boldsymbol{y} \in \mathbb{R}^{d}$ and $\lambda \in \left [0,1 \right ]$ we have \begin{align*} f(\lambda \boldsymbol{x} +(1-\lambda)\boldsymbol{y} ) \leq \lambda f(\boldsymbol{x}) + (1-\lambda)f(\boldsymbol{y}) \end{align*} to show that if f is convex and differentiable at $\boldsymbol{x}$ then \begin{align*} f(\boldsymbol{y}) \geq f(\boldsymbol{x}) + \nabla f(\boldsymbol{x})^{\top} (\boldsymbol{y}-\boldsymbol{x}) \end{align*} for all $\boldsymbol{y} \in \mathbb{R}^{d}$ (Use the definition of the directional derivative)

In order to get to the desired result, I have tried using the definition of a convex function together with an illustration. I am unsure whether my reasoning is correct and believe that there must be a way to derive this mathematically, but unfortunately I don't really have a strong maths background. I have found a similar question on math exchange here, but it doesn't really answer my question. I am posting the question on CV since I haven't received any response on math exchange.

Attempted proof

picture

Summary

I have tried to prove this by example, but am looking for an analytical solution.

Any help will be greatly appreciated :)

Stochastic
  • 799
  • 1
  • 6
  • 28

1 Answers1

4

It simplifies the notation to work with

$$g(\mathbf{y}) = f(\mathbf{y}+\mathbf{x}) - f(\mathbf{x})$$

because (as you can readily compute)

$$g(\mathbf{0}) = 0;\ \nabla g(\mathbf{0}) = \nabla f(\mathbf{x});$$

and $f$ is convex if and only if $g$ is. In particular, note that for any $0\le h\le 1,$ the convexity of $g$ means

$$g(h\mathbf{y}) = g((1-h)\mathbf{0} + h\mathbf{y}) \le (1-h) g(\mathbf{0}) + h g(\mathbf{y}) = h g(\mathbf{y}).\tag{*}$$

Recall the definition of the directional derivative and its relationship to the gradient: given a vector $\mathbf{y}$ based at $\mathbf{0},$

$$\nabla g(\mathbf{0})^\prime \mathbf{y} = \nabla_\mathbf{y}g(\mathbf{0}) = \lim_{h\to 0} \frac{g(h\mathbf{y}) - g(\mathbf{0})}{h} = \lim_{h\to 0} \frac{g(h\mathbf{y})}{h}.$$

Because you implicitly assume $g$ is differentiable at $0,$ this limit will be attained as $h$ is shrunk through positive values only. Noting that such values of $h$ eventually are in the range $(0,1],$ we may apply the inequality $(*)$ to conclude

$$\eqalign{ \nabla f(\mathbf{x})^\prime \mathbf{y} &= \nabla g(\mathbf{0})^\prime \mathbf{y} \\ &= \lim_{h\to 0^+} \frac{g(h\mathbf{y})}{h} \le \lim_{h\to 0^+} \frac{hg(\mathbf{y})}{h} \\&= \lim_{h\to 0^+}g(\mathbf{y}) = g(\mathbf{y}) \\ &= f(\mathbf{x}+\mathbf{y}) - f(\mathbf{x}),}$$

QED.


If you don't like the initial change of $f$ to $g,$ go back and apply the argument directly to $f.$ There's a little bit of distracting algebra but nothing essential will change.

whuber
  • 281,159
  • 54
  • 637
  • 1,101