4

Why is proximal coordinate descent much less affected by bad conditioning than proximal gradient descent?

For example, we can consider this problem : $\min_x \frac{1}{2}\|Ax-b\|^2_2 + \lambda\|x\|_1$

If A has a large condition number, how can we demonstrate that the algorithm of proximal coordinate descent is much less affected than proximal gradient?

aferjani
  • 66
  • 2

0 Answers0