6

I am building a neural network model with TensorFlow and Keras in python. My model is performing well on unseen data in the way I desire and everything is fine. but the problem that I don't have any idea how to implement a solution for it is this: consider my neural network has the input like this

Input = [i1, i2, i3, i4, i5]

and the output of the network is only single value and we call it

Output = O

I want that the output of the neural network be greater than specific input value. here for example I want that O > i3. despite the very good performance of my neural network on Test Data (unseen data) but in some cases the mentioned condition will be violated and this is a problem for me.

Inevitable
  • 93
  • 7

3 Answers3

13

A dirt-simple solution is to add a regularization term, so your loss function is $\text{loss} + \lambda \text{ReLU} (i_3 - O)$. This adds a penalty whenever your inequality is violated, so the model will tend to respect the constraint.

While this solution is inexact, It will be more challenging to solve this exactly because constrained optimization is not something NN libraries are designed for.

Some related solutions:

Loss function in machine learning - how to constrain?

Sycorax
  • 76,417
  • 20
  • 189
  • 313
  • Thank you. I got the concept you explained. is this solution definite? I mean that if I implement it successfully the condition always is true or there will be the risk that sometimes I fall into the same issue. and for the last part would you mind append a snippet code for your answer in order to implement it in tensorflow (although I'm looking for the way to implement it now) It would be your kindness. Thank you again – Inevitable Jan 29 '22 at 14:34
  • 4
    This isn’t a code website, and I don’t use Keras/TF, but tensorflow implements addition, multiplication and ReLU. There’s no guarantee that this will always respect the inequality, but choosing a larger $\lambda$ will assign larger penalties to violations of the constraint. – Sycorax Jan 29 '22 at 14:52
  • I figured it out how to implement custom keras loss function. but the thing I'm thinking is how to deal with it when we have a array of outputs and inputs (rather than only one) is it a good idea to sum over all λReLU(i3−O) for each pair of input and output or you have better idea for it – Inevitable Jan 30 '22 at 11:40
  • @Inevitable Each example has loss something like $(\hat y - y)^2 + \lambda \text{ReLU}(i_3 - O)$. Some obvious options are $\sum_{j=1}^N \left[ (\hat y_j - y_j)^2 + \lambda \text{ReLU}(i_{3j} - O_j) \right]$ and $\frac 1 N\sum_{j=1}^N \left[ (\hat y_j - y_j)^2 + \lambda \text{ReLU}(i_{3j} - O_j) \right]$ – Sycorax Jan 30 '22 at 14:00
  • Yup I know the options but I thought some of theme has priority instead of try and error fashion of finding the good one. – Inevitable Jan 30 '22 at 14:07
  • @Inevitable It's not trial and error -- one is just a rescaling of the other, so they are in a certain sense identical. The choice between them amounts to whether you want to worry about rescaling your learning rate when you change minibatch size. https://stats.stackexchange.com/questions/358786/mean-or-sum-of-gradients-for-weight-updates-in-sgd/358971#358971 – Sycorax Jan 30 '22 at 14:09
  • "constrained optimization is not something NN libraries are designed for" -- if I understood correctly, it would not be possible in general because a 0/1 condition is not differentiable? – Radio Controlled Feb 01 '22 at 11:55
  • @RadioControlled More like gradient descent is an unconstrained optimizer, so if you want to enforce inequality or equality constraints, it's a non-trivial amount of work and not something that is supported out of the box. – Sycorax Feb 01 '22 at 14:40
9

Could you just let the output be un-constrained, and then postprocess by doing something like $O + i3$? You can even put this directly into your loss function.

Josh Bone
  • 123
  • 7
  • 5
    Good suggestion (+1). In other words, if OP is using MSE loss and their target value is $y$, then the loss is computed as $\left(y - (O + i_3)\right)^2$, and similarly for other loss functions. – Sycorax Jan 29 '22 at 15:53
  • Right. The only caveat is that if $O < 0$ you will need to handle that case separately. – Josh Bone Jan 29 '22 at 15:59
  • why should I add i3 to the output? I don't know how much i3 is bigger than my output and only thing I know is that it's always bigger and the amount is only known by the god :). Unfortunately I'm still searching about the implementation of custom regularization in Keras :( – Inevitable Jan 29 '22 at 16:00
  • 3
    @JoshBone In your case, $O$ is something generated by the network, so we have the freedom to use an activation function to bound it; sigmoid and relu units, for instance, both yield non-negative results, so we can eliminate the $O < 0$ case from consideration if this is a concern. – Sycorax Jan 29 '22 at 16:00
  • 1
    @JoshBone the output is positive-definite – Inevitable Jan 29 '22 at 16:01
  • 3
    @Inevitable To make this more explicit, the network outputs $\delta > 0$ and Josh is defining $O = i_3 + \delta > i_3$. By construction, the constraint must always be satisfied. – Sycorax Jan 29 '22 at 16:57
  • 1
    @Sycorax Yeah the same solution came into my mind. instead of training on Output we can also train on Output - i3 and this always be positive and then add network prediction to i3 so the final result is always greater ( or maybe equal ) to the i3 – Inevitable Jan 29 '22 at 17:03
3

After devoting good time, finally I found how to implement the solution in keras/tensorflow library with the regard to previous useful answers to my question. First if we want to implement a costume keras loss function with some parameters and also accessing to inputs we have to define:

def custom_loss(alpha):
    def loss(data, y_pred):
        y_true = tf.reshape(data[:, 0], (-1, 1))
        input = tf.reshape(data[:, 1], (-1, 1))
        diff = K.abs((y_true - y_pred) / K.clip(K.abs(y_true), K.epsilon(), 
        None))
        return 100. * K.mean(diff, axis=-1) + 
        K.mean(alpha*tf.keras.activations.relu(input - y_pred))

here I padded inputs into the right side of the output tensor and then inside the function I repacked it to access inputs. here I used mean absolute percentage error as the base loss function and then added desired condition with the aid of alpha parameter as regularization parameter and the Relu function. be aware of using right column of your input data in this function then if we want to build neural network model the following codes has to be used. first we pad input to our output easily as follow:

output_train = np.append(y_train, x_train, axis =1)
output_valid = np.append(y_valid, x_valid, axis =1)

in the compile function:

model.compile(loss = custom_loss(alpha=10000))

here I used 10000 as the alpha and it's obvious that can be changed based on the case. now we can fit model on our data. but there is another problem when we want to load the saved model. if we want to load the model we have to use the following code

model = keras.models.load_model(model_save_address, custom_objects={'loss': custom_loss(10000)})

now everything is fine and we can run our model and train and test it easily on our data.

NOTE: First I thank all people helped me solve the issue. I think its worth noting that before solving the issue despite having the good model that performs well with very low error on my data, in the 50% of the cases my desired condition had been violating and that was a problem for me. but after implementing this solution, only in 0.5% cases the condition wont be satisfied and I hope to find another solution to reduce it furthermore.

Inevitable
  • 93
  • 7