Why is proximal coordinate descent much less affected by bad conditioning than proximal gradient descent?
For example, we can consider this problem : $\min_x \frac{1}{2}\|Ax-b\|^2_2 + \lambda\|x\|_1$
If A has a large condition number, how can we demonstrate that the algorithm of proximal coordinate descent is much less affected than proximal gradient?