The price of an option (in finance) is given by the famous Black-Scholes equation. I would like to design a neural network to predict the price of an option. Basically the inputs are the attributes of the option and the output is the price. So we are essentially learning a deterministic non-linear function.
My issue is that the in sample average mean-sqaured-error is not low enough for my purposes. I know that the issue is a bias problem since the in sample error is too high. Moreover, the in sample and out of sample error are very similar.
I've replicated the results in this paper: https://srdas.github.io/Papers/BlackScholesNN.pdf
Now I am trying to optimize the network they present in the paper. They leave network optimization for further work in their paper.
Basically, I don't have any intuition for which network architecture to use. I have tried a few obvious things like increasing the number of layers, number of nodes per layer, different activation functions, but it doesn't seem to help much.
In their paper, they use this architecture:
I've been dabbling with different networks. For example, I tried this:
But it doesn't seem to help.
What is the intuition for next steps? Should I play around with my features (i.e. normalization or something else)? How do I optimize the network without just trying random things?
EDIT: Optimize means reduce of out of sample loss.