Given a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ where $n$ is large and $f$ is non-convex. The following characterises the sparse minimizer of $f$.
$$ x^* = \arg \min_{x} f + ||x||_0 $$ where $||x||_0$ is the number of nonzero elements of $x$.
This problem is non-differentiable and not easy to solve. I was wondering if there are good references wherein this problem is addressed and algorithms with convergence are proven.
Motivation: The reason I am asking for the help is that in the field of neural networks sparsification, I have found one recent paper introducing an algorithm for solving this problem, refer to this one. Also, we can see that in the neural network sparsification field, L1 or L0 approximation are not used necessarily. For example you can see these two papers Optimal brain damage and Second order derivatives for network pruning
If you have seen them, please share your knowledge with me.