0

I'm trying to implement a very simple one layered MLP for a toy regression problem with one variable (dimension = 1) and one target (dimension = 1). It's a simple curve fitting problem with zero noise.

Matlab - Deep Learning Toolbox

Using levenberg-marquardt backpropagation on a MLP with a single hidden layer with 100 neurons and hyperbolic tangent activation I got pretty decent performance with almost zero effort:

MSE = 7.18e-08

Here's a plot of the fitting: enter image description here

This is the working matlab code. Please note that the "feedforwardnet(100)" function only produces a network object with one hidden layer with 100 neurons and tanh activation and output layer with linear activation:

net = feedforwardnet(100);
net.trainParam.min_grad = 1e-25;
net.trainParam.max_fail = 50;
net.trainParam.epochs = 500;
%net1.trainParam.showWindow = false;
net.inputs{1,1}.processFcns = {};
net.outputs{1,2}.processFcns = {};
net = train(net,Train_Vars,Train_Target);
Test_Predictions = net(Test_Vars);
Accuracy = msemetric({Test_Predictions},{Test_Target});

Python - TensorFlow - Keras

With the same network settings I used in matlab there's almost no training. No matter how hard I try to tune the training parameters or switch the optimizer.

MSE = 0.12900154

enter image description here

I can obtain something better using RELU activations for the hidden layer but we're still far:

MSE = 0.0582045

enter image description here

This is the code I used in Python:

#  IMPORT LIBRARIES
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
from tensorflow import keras


#  IMPORT DATASET FROM CSV FILE, SHUFFLE TRAINING SET 
#  AND MAKE NUMPY ARRAY FOR TRAINING (DATA ARE ALREADY NORMALIZED)
dataset_path = "C:/Users/Rob/Desktop/Learning1.csv"
Learning_Dataset = pd.read_csv(dataset_path
                          , comment='\t',sep=","
                          ,skipinitialspace=False)
Learning_Dataset = Learning_Dataset.sample(frac = 1)  # SHUFFLING


test_dataset_path = "C:/Users/Rob/Desktop/Test1.csv"
Test_Dataset = pd.read_csv(test_dataset_path
                          , comment='\t',sep=","
                          ,skipinitialspace=False)


Learning_Target = Learning_Dataset.pop('Target')
Test_Target = Test_Dataset.pop('Target')

Learning_Dataset = np.array(Learning_Dataset,dtype = "float32")
Test_Dataset = np.array(Test_Dataset,dtype = "float32")
Learning_Target = np.array(Learning_Target,dtype = "float32")
Test_Target = np.array(Test_Target,dtype = "float32")





#  DEFINE SIMPLE MLP MODEL
inputs = tf.keras.layers.Input(shape=(1,))
x = tf.keras.layers.Dense(100, activation='relu')(inputs)
y = tf.keras.layers.Dense(1)(x)
model = tf.keras.Model(inputs=inputs, outputs=y)




#  TRAIN MODEL
opt = tf.keras.optimizers.RMSprop(learning_rate = 0.001,
                                  rho = 0.9,
                                  momentum = 0.0,
                                  epsilon = 1e-07,
                                  centered = False)
early_stop = keras.callbacks.EarlyStopping(monitor='val_loss', patience=100)
model.compile(optimizer = opt,
              loss = 'mse',
              metrics = ['mse'])


model.fit(Learning_Dataset, 
          Learning_Target,  
          epochs=500, 
          validation_split = 0.2, 
          verbose=0, 
          callbacks=[early_stop], 
          shuffle = False, 
          batch_size = 100)




#  INFERENCE AND CHECK ACCURACY
Predictions = model.predict(Test_Dataset)
Predictions = Predictions.reshape(10000)

print(np.square(np.subtract(Test_Target,Predictions)).mean()) #  MSE

plt.plot(Test_Dataset,Test_Target,'o',Test_Dataset,Predictions,'o')
plt.legend(('Target','Model Prediction'))
plt.show()

What am i doing wrong?

Thanks

  • Also relevant: https://stats.stackexchange.com/questions/438064/why-does-gradient-descent-fail-training-a-network-for-predicting-times-table/438069#comment817257_438069 – Sycorax Dec 09 '19 at 15:57
  • Both the (unique) variable and the target are already normalized as you can see in the plots. I dont really understand what's going on here. The same code produce very good results on another problem (classification.) – user191143 Dec 09 '19 at 16:06
  • So maybe it's not an issue with scaling. Both threads provide a number of recommendations to try and debug in addition to scaling. – Sycorax Dec 09 '19 at 16:18
  • @SycoraxsaysReinstateMonica thank you very much for your help. Unfortunately I've already tried everything is suggested in that post. (Normalization, tuning training parameters and switching training optimizers.) My best guess so far is that the optimizers in tensorflow are not good in this particular task as well as levemberg-marquardt gradient descent that is implemented in matlab... Any other idea? – user191143 Dec 09 '19 at 16:30
  • The third suggestion in https://stats.stackexchange.com/questions/438064/why-does-gradient-descent-fail-training-a-network-for-predicting-times-table/438069#comment817257_438069 is that LM is much better than gradient descent. There are probably additional differences between the MATLAB code and the Keras code, beyond the choice of optimizer, so to understand why the two methods produce different results, you'll have to understand what each is doing. I'd start by reading the documentation. – Sycorax Dec 09 '19 at 16:36
  • @SycoraxsaysReinstateMonica also in my experience LM is WAY better than almost every other training algorithm in terms of reached accuracy. But now I need to move from matlab to tensorflow and really can't accept that I can't fit a curve like that with a single hidden layer network in tensorflow : D If I add several layers with a lot of neurons I can reach a barely decent fitting in tensorflow but matlab proofs that one layer is more than enough!! I need a solution. – user191143 Dec 09 '19 at 16:47
  • You're using `shuffle=False` in the `fit` call. If your data are sorted, then that would explain why your plot doesn't fit the largest 20% of observations, the proportion of observations allocated to the validation set. So there's one difference. You can probably find more if you write unit tests to compare each step of the Keras and Matlab code. Making a neural network is hard. Comparing two software implementations is hard. There's no way around the fact that you'll have to do a lot of work to get a good result. – Sycorax Dec 09 '19 at 16:54
  • Let us [continue this discussion in chat](https://chat.stackexchange.com/rooms/101980/discussion-between-user191143-and-sycorax-says-reinstate-monica). – user191143 Dec 09 '19 at 16:55

0 Answers0