1

I am working on a project in which I am using this dataset, I implement neural network by using keras for it but I am not getting testing accuracy more than 80%. Here is the details:

Number of training examples = 1752

number of testing examples = 310

shape of image = (64,64)

optimization algorithm = adam (learning-rate = 0.0001)

number of epochs = 1500

minibatch-size = 32

This is the keras model code I used:

model = Sequential()
model.add(Dense(30, input_dim = 4096,kernel_initializer= "uniform", 
  bias_initializer= "zeros",activation= "relu"))
model.add(Dropout(0.3))

model.add(Dense(20,kernel_initializer= "uniform", bias_initializer= 
  "zeros",activation= "relu"))
model.add(Dropout(0.3))

model.add(Dense(15,kernel_initializer= "uniform", bias_initializer= 
   "zeros",activation= "relu"))
model.add(Dropout(0.3))

model.add(Dense(10,kernel_initializer= "uniform", bias_initializer= 
   "zeros",activation= "softmax"))

I tried different values for dropout probability(keep_prob), but nothing change much.

Without dropout I am getting 98% training accuracy and 75% testing accuracy. So I tried dropout after this I am able to reduce the variance, but not getting enough accuracy. After so many hit and trials I got 80% testing accuracy max till not, but not getting higher than this.

I also tried l2 regularization and l2 dropout combined also, but nothing changes much. By l2 training accuracy goes very down like 72%, but it does not increase my testing accuracy. How can I increase it?

Akshat Jain
  • 21
  • 1
  • 1
  • 2
  • See also: https://stats.stackexchange.com/questions/365778/what-should-i-do-when-my-neural-network-doesnt-generalize-well – Sycorax Sep 13 '18 at 22:25

2 Answers2

7

It's an image. Use a convolutional neural network rather than a fully connected network. Here's one of my favorite tutorials (check out their entire collection of tutorials, it's worthwhile).

https://github.com/aymericdamien/TensorFlow-Examples/blob/master/notebooks/3_NeuralNetworks/convolutional_network.ipynb

Convolutional neural networks take advantage of the spacial locality inherent in images, whereas fully connected networks flatten the image and effectively permute the pixels randomly. Ever tried to look at an image flattened into an array with the pixels randomly permuted? Not easy. Nor is it for a neural network.

David Parks
  • 1,407
  • 1
  • 12
  • 17
2

David's answer brought up a valid point - you should try using CNNs to exploit the spatial correlations in the images.

Further suggestions:

  1. High training accuracy and significantly lower test accuracy is a sign of overfitting, so you should try to finetune your model with a validation dataset first. For example, you can split your training examples with a 70-30 split, with 30% validation data. Once you get reasonably good results with the above, then test the model's generalization ability with the test dataset.

  2. Since your number of training examples is quite small (and the image size as well), you can try some k-fold stratified or random cross validation. Evenly shuffle and split the training examples into some k number of folds, with each fold having approximately the same distribution of classes. This will let you see your validation accuracy more realistically.

  3. Perform early stopping - 1500 epochs seem a bit too excessive for a small dataset. Try a smaller number of epochs and see if your results improve.

  4. Try a better initializer than just a uniform one. e.g. try "He initialization" (https://keras.io/initializers/).

infomin101
  • 1,363
  • 4
  • 14
  • 20