I am training GoogleNet on the Stanford cars data set. It's 8000 training images of cars with labels (2004 Toyota Camry).
- I made minimal changes to the network. I just changed the loss outputs to 196 since I have 196 types of vehicles.
- I used the pretrained Caffe GoogleNet weights as initial weights.
- The image dimensions are scaled to 224x224.
I consistently see the following behavior. My loss hovers between 3 to 5 and then hits some iteration where it just sky rockets all the way up to 87. You can see it happens in the 15400 iteration and I'm on the 39320 iteration right now and it hasn't changed from 87. The learning rate is bouncing about a little bit.
What causes this kind of behavior? Should I just cut my losses and use the weights around the 15400 iteration for inferences?
I0921 19:00:23.580992 36 solver.cpp:218] Iteration 15360 (2.7357 iter/s, 14.6215s/40 iters), loss = 5.07562
I0921 19:00:23.581143 36 solver.cpp:237] Train net output #0: loss1/loss1 = 4.07955 (* 0.3 = 1.22386 loss)
I0921 19:00:23.581161 36 solver.cpp:237] Train net output #1: loss2/loss2 = 3.34162 (* 0.3 = 1.00248 loss)
I0921 19:00:23.581168 36 solver.cpp:237] Train net output #2: loss3/loss3 = 3.1286 (* 1 = 3.1286 loss)
I0921 19:00:23.581182 36 sgd_solver.cpp:105] Iteration 15360, lr = 0.00996795
I0921 19:00:38.185421 36 solver.cpp:218] Iteration 15400 (2.73888 iter/s, 14.6045s/40 iters), loss = 13.0996
I0921 19:00:38.185487 36 solver.cpp:237] Train net output #0: loss1/loss1 = 87.3365 (* 0.3 = 26.201 loss)
I0921 19:00:38.185503 36 solver.cpp:237] Train net output #1: loss2/loss2 = 87.3365 (* 0.3 = 26.201 loss)
I0921 19:00:38.185511 36 solver.cpp:237] Train net output #2: loss3/loss3 = 87.3365 (* 1 = 87.3365 loss)
I0921 19:00:38.185523 36 sgd_solver.cpp:105] Iteration 15400, lr = 0.00996786