Loss function increases with Epochs for 2DConv

Question

Background

Hi all. I'm new to Machine Learning & Cross Validated, so please let me know if I made any mistakes. Any advice to point me in the right direction would be greatly appreciated.

Problem

I created a 2D Convolutional Neural Network Classification Model using this tutorial, and I used my own data.

I was very excited, until I saw the following problems:

The loss is increasing (or stable) as the # epochs increases
The browser becomes incredibly slow: High memory usage in GPU: 1275.60 MB, most likely due to a memory leak

Attempted Solutions

To address Problem #1, I tried optimizing the hyperparameters, as recommended by Sycorax:

Adjusted Learning Rate: decreased from 0.1 to 0.05 and increased from 0.01 to 0.05. Neither inc. or dec. it improved the loss function.
Adjusted Batch Size: increased from 16 to 32 and decreased from 16 to 8. Neither inc. or dec. it improved the loss.
Unit Testing: I isolated sections of the CNN and checked if they resulted in the expected output, which they did. For instance, I checked if I was retrieving the correct training data using image(linearSinusoidal[0], 0, 0, width, height), which verified I was indeed importing the right classes for Supervised Learning.

I also checked normalization and other components of the CNN, but to no avail. What can I do to fixed this problem?

MWE

function setup() {
  let options = {
    inputs: [432, 288, 4],
    task: 'imageClassification',
    debug: true,
    learningRate: 0.05,
    hiddenUnits: 5,
  };

  functionClassifier = ml5.neuralNetwork(options);

  for (let i = 0; i < linearSinusoidal.length; i++){
    functionClassifier.addData({ image: linearSinusoidal[i] },{ label: "Linear Sinusoidal" });
    functionClassifier.addData({ image: quadraticSinusoidal[i] },{ label: "Quadratic Sinusoidal" });
    functionClassifier.addData({ image: cubicSinusoidal[i] },{ label: "Cubic Sinusoidal" });
  }

  functionClassifier.normalizeData();
  functionClassifier.train({epochs: 50}, finishedTraining);

}

Link

https://editor.p5js.org/Refath/sketches/scSel1o4j

You'll need to do more than try 2 learning rates to exclude "learning rate is too large" from consideration. Try a grid of 10 values on a logrithmic scale from `1e-1` to `1e-6`. You'll also need to validate that your neural network is doing what you want. I don't know this language, so I'm not sure what `ml5.neuralNetwork` does or where it's defined, but if you've written the code, you'll need to validate its correctness. Finally, there are a number of additional recommendations in the duplicate thread which are not addressed in your [edit]; I'd also try those. — Sycorax, Jul 23 '21 at 17:17
For instance, in `pytorch`, a very common newbie mistake is to forget to call `zero_grad()` between iterations, which is almost guaranteed to cause divergence because the gradients are accumulated. I don't know if there are any similar gotchas in this library, but you'll need verify that they don't exist, perhaps by trying it on a known problem. — Sycorax, Jul 23 '21 at 17:21
@Sycorax It turned out none of those suggestions worked. However, after more than many hours of debugging, my final method worked: making the image resolutions 100x smaller and grayscaling them. The loss function is now decreasing! — BR56, Jul 29 '21 at 14:55
Glad to hear you got it working. I'll reopen this thread so that you can write up an answer explaining how you determined that this was the problem and how you solved it. — Sycorax, Jul 29 '21 at 14:59

score 1 · Accepted Answer · answered Jul 29 '21 at 15:12

To fix the problem of the increasing loss function, I decreased the image resolution by 99.751% and made them grayscale. The images were downsized from 432 (Width) * 238 (Height) * 4 (RGBA) = 411,264 px to 36 * 36 = 1024 px.

The validation accuracy was still hovering around 53% - 80%, as I was only running 5 epochs and training with 10 samples per class. After increasing to 100 epochs and 40 samples/class, validation accuracy jumped to 91% for Linear and Quadratic instances. The loss function reached 0.01%.

In addition, saving and loading the model for future use removes the need to retrain on every execution of the program:

  const modelDetails = {
    model: 'model/model.json',
    metadata: 'model/model_meta.json',
    weights: 'model/model.weights.bin'
  }
  
  functionClassifier.load(modelDetails, modelLoaded);

Loss function increases with Epochs for 2DConv

Background

Problem

Attempted Solutions

MWE

Link

1 Answers1