Neural network's weight reduction

Question

Are there any algorithms/methods for taking a trained model and reducing its number of weights with as little negative effect as possible to its final performance?

Say I have a very big (too big) model which contains X weights and I want to cut it down to have 0.9*X weights with as little damage as possible to the final performance (or maybe even to the highest possible gain in some cases).

Weight reduction occurs either by changing the model's basic architecture and removing layers or by reducing feature depth in said layers. Obviously after reduction some fine-tuning of the remaining weights will be required.

There indeed is research on that. I think it's called network pruning. An example project: [keras-surgeon](https://github.com/BenWhetton/keras-surgeon) does just that. — Jakub Bartczuk, Dec 20 '18 at 16:39
In a way, weight regularization does this as part of the training process (due to the regularization, some of the weight matrices are practically equal to zero which results in a simpler model). But I was asking about something different which is called pruning. — Mark.F, Dec 22 '18 at 08:29

score 5 · Answer 1 · answered Dec 20 '18 at 11:19

5

You might wanna check:

http://yann.lecun.com/exdb/publis/pdf/lecun-90b.pdf

And a more recent paper on the topic:

https://arxiv.org/pdf/1506.02626v3.pdf

However, I was not able to find an implementation of these two. So you will need to implement it yourself.

answered Dec 20 '18 at 11:19

gorjan

238
1
7

3

I think this answer would be improved if you provided complete citations for these articles (because the links can go dead) and included a summary of the main results from these links (because just providing links isn't enormously helpful). – Sycorax Dec 20 '18 at 16:49

score 1 · Accepted Answer · answered Dec 21 '18 at 07:19

After reading some of the helpful comments and answers, I've did some focused reading on my own.

As mentioned in other answers, this process is called Pruning and like many other ideas in the neural network area, it is not new. From what I can tell, it originates in LeCun's 1990 paper with the lovely "Optimal Brain Damage" (The paper cites some earlier works on network minimization from the late 80's but I didn't go that far down the rabbit hole). The main idea was to approximate the change in loss caused by removing a feature map and minimize it:

∆C(hi) = |C(D|W0) − C(D|W)|

Where C is the cost function, D is our dataset (of x samples and y labels) and W are the weights of the model (W0 are the original weights). hi is the output produced from parameter i, which can be either a full feature map in convolution layers or a single neuron in dense layers.

More recent works on the subject include:

"2016 - Pruning convolutional neural networks for resource efficient inference"

In this paper they propose the following iterative process for pruning CNNs in a greedy manner:

They present and test several criteria for the pruning process. The first and most natural one to use is the oracle pruning, which desires to minimize the difference in accuracy between the full and pruned models. However it is very costly to compute, requiring ||W0|| evaluations on the training dataset. More heuristic criteria which are much more computationally efficient are:

Minimum Weight - Assuming that a convolutional kernel with low L2 norm detects less important features than those with a high norm.
Activation - Assuming that an activation value of a feature map is smaller for less impotent features.
Information Gain - IG(y|x) = H(x) + H(y) − H(x, y), where H is the entropy.
Taylor Expansion - Based on the Taylor expansion, we directly approximate change in the loss function from removing a particular parameter.

2016 - Dynamic Network Surgery for Efficient DNNs Unlike the previous methods which accomplish this task in a greedy way, they incorporate connection splicing into the whole process to avoid incorrect pruning and make it as a continual network maintenance. With this method, without any accuracy loss, they efficiently compress the number of parameters in LeNet-5 and AlexNet by a factor of 108× and 17.7× respectively.

The figures and a much of what I written is based on the original papers. Another useful explanation can be found in the following link: Pruning deep neural networks to make them fast and small.

A good tool for modifying trained Keras models is the Keras-surgeon. It currently enables easy methods to: delete neurons/channels from layers, delete layers, insert layers and replace layers.

I didn't find any methods for the actual pruning process (testing criteria, optimizing etc.)

There is quite a lot more to network pruning (and generally model compression) than this. I suggest you read https://openreview.net/forum?id=rJlnB3C5Ym and http://openaccess.thecvf.com/content_ECCV_2018/papers/Yihui_He_AMC_Automated_Model_ECCV_2018_paper.pdf — DeltaIV, Dec 21 '18 at 16:12
Thank you. I've only recently been exposed to this area. This was just my accumulated knowledge over 36 hours of reading several sources. You additional papers are appreciated. — Mark.F, Dec 22 '18 at 08:24

Neural network's weight reduction

2 Answers2