2

Given a function $f: \mathbb{R}^n \rightarrow \mathbb{R}$ where $n$ is large and $f$ is non-convex. The following characterises the sparse minimizer of $f$.

$$ x^* = \arg \min_{x} f + ||x||_0 $$ where $||x||_0$ is the number of nonzero elements of $x$.

This problem is non-differentiable and not easy to solve. I was wondering if there are good references wherein this problem is addressed and algorithms with convergence are proven.

Motivation: The reason I am asking for the help is that in the field of neural networks sparsification, I have found one recent paper introducing an algorithm for solving this problem, refer to this one. Also, we can see that in the neural network sparsification field, L1 or L0 approximation are not used necessarily. For example you can see these two papers Optimal brain damage and Second order derivatives for network pruning

If you have seen them, please share your knowledge with me.

Saeed
  • 3,943
  • 1
  • 16
  • 34
  • While there are a lot of ways to incorporate pruning in neural networks, the approaches I know of don't prove any convergence (though I may not interpret that word correctly) - many act like heuristics, that may or may not give good results. – Sudix Aug 16 '20 at 06:38
  • @Sudix: you are right that is why I am asking my question. – Saeed Aug 16 '20 at 07:03
  • But then, aren't the papers essentially meaningless to the question? – Sudix Aug 16 '20 at 07:11
  • @Sudix: I added them to give a sense of the question. – Saeed Aug 16 '20 at 16:19

1 Answers1

1

The standard way t0 address this is to use L1 norm regularization (penalty) instead of L0 regularization. The L1 norm regularization term is convex; therefore, it preserves convexity of an objective function which is convex without the L1 norm regularization term.

Although the resulting objective function is non-differentiable, depending on the rest of the objective function and constraints, the optimization problem might be amenable to highly efficient and robust convex conic formulation and solution, for instance using CVX or CVXPY.

In practice, L1 norm regularization seems to work well in inducing sparse solutions. But it is a compromise, in the interest of computational tractability, vs. using L0 norm regularization.

There are many sources of info explaining this if you search for L1 norm regularization

Mark L. Stone
  • 4,351
  • 12
  • 13
  • thank you for your reply but I am looking for algorithms solving these problems l [like this one](https://openreview.net/pdf?id=H1Y8hhg0b). In the neural network sparsification, L1 approximation is not use necessarily. For example you can see these two papers [Optimal brain damage](https://dl.acm.org/doi/10.5555/109230.109298) and [Second order derivatives for network pruning](https://papers.nips.cc/paper/647-second-order-derivatives-for-network-pruning-optimal-brain-surgeon.pdf). – Saeed Aug 16 '20 at 05:55
  • If the problem is not too large (value of n, i.e., dimension of x), you can use a branch and bound global optimizer, such as BARON. It can handle the non-differentiability of the L0 "norm" (not really a norm) by formulating with integer variables to handle it. See https://minlp.com/baron and https://minlp.com/baron-publications . My answer provided a formulation using L1 norm, for which efficient algorithms exist, depending on the form of f. – Mark L. Stone Aug 16 '20 at 11:02
  • that one is solver but I am looking for methods and algorithms. – Saeed Aug 16 '20 at 17:18
  • 1
    Read the papers in the publication link in my preceding comment. The algorithms, and their implementation in software can be very complicated and difficult to understand. The simple,easy to understand algorithms are usually lousy. The best solvers are commercial, and some of their details are proprietary and not published - that is reality. – Mark L. Stone Aug 16 '20 at 19:54