Avoiding Matlab's underflow prevention in backprop, due to performance cost

Question

In Matlab, I understand that if a number gets closer to zero than realmin, then Matlab converts the double to a denorm . I am noticing this causes significant performance cost. In particular I am using a gradient descent algorithm that when near convergence, the gradients (of my bespoke neural network) drop below realmin such that the algorithm incurs heavy performance cost (due to, I am assuming, type conversion behind the scenes). I have used the following code to validate my gradient matrices so that no numbers falls below realmin:

function mat= validateSmallDoubles(obj, mat, threshold) mat= mat.*(abs(mat)>threshold); end

Is this usual practice and what value should threshold take (obviously you want this as close to realmin as possible, but not too close otherwise any additional division operations will send some elements of mat below realmin after validation)?. Also, specifically for neural networks, where are the best places to do gradient validation without ruining the network's ability to learn?. I would be grateful to know what solutions people with experience in training neural networks have? I am sure this is a problem for all languages. Tentative threshold values have ruined my network's learning.

This is the same question on stack overflow: https://stackoverflow.com/questions/49425657/matlab-dealing-with-denorm-performance-cost-conversion-when-close-to-realmin-fo — rnoodle, Mar 22 '18 at 10:05

score 0 · Accepted Answer · answered Mar 25 '18 at 17:29

I traced the diminishing gradient occurrences to the Adam SGD optimiser - the biased moving average matrix calculations in the Adam optimiser were causing matlab to carry out the denorm operation. I simply thresholded the matrix elements for each layer after these calculations, with threshold=10*realmin, to zero without any effect on learning. I have yet to investigate why my moving averages were getting so close to zero as my architecture and weight initialisation priors would normally mitigate this.

Avoiding Matlab's underflow prevention in backprop, due to performance cost

1 Answers1