On solving ode/pde with Neural Networks

Question

Recently, I watched this video on YouTube on the solution of ode/pde with neural network and it motivated me to write a short code in Keras. Also, I believe the video is referencing this paper found here.

I selected an example ode $$ \frac{\partial^2 x(t)}{\partial t^2} + 14 \frac{\partial x(t)}{\partial t} + 49x(t) = 0 $$

with initial conditions $$ x(0) = 0, \ \frac{\partial x(t)}{\partial t}\rvert_{t=0} = -3 $$

According to the video, if I understand correctly, we let the neural network $\hat{x}(t)$, be the solution of our ode, so $x(t) \approx \hat{x}(t)$

Then, we minimize the ode which is our custom cost function per say. Since, we have initial conditions, I created a step function for individual data point loss:

At, $t=0$: $$ loss_i = \left( \frac{\partial^2 \hat{x}(t_i)}{\partial t^2} + 14 \frac{\partial \hat{x}(t_i)}{\partial t} + 49\hat{x}(t_i) \right)^2 + \left( \frac{\partial \hat{x}(t_i)}{\partial t} + 3 \right)^2 + \left( \hat{x}(t_i) \right)^2 $$

else $$ loss_i = \left( \frac{\partial^2 \hat{x}(t_i)}{\partial t^2} + 14 \frac{\partial \hat{x}(t_i)}{\partial t} + 49\hat{x}(t_i) \right)^2 $$

Then, minimize batch loss $$ \min \frac{1}{b} \sum_{i}^{b} loss_i $$

where $b$ is the batch size in training.

Unfortunately, the network always learns zero. On good evidence, the first and second derivatives are very small - and the $x$ coefficient is very large i.e.: $49$, so the network learns that zero output is a good minimization.

Now there is a chance that I misinterpret the video because I think my code is correct. If someone can shed some light I will truly appreciate it.

Is my cost function correct? Do I need some other transformation?

Update:

I managed to improve the training by removing the conditional cost function. What was happening was that the the conditions were very infrequent - so the network was not adjusting enough for the initial conditions.

By changing the cost function to the following, now the network has to satisfy the initial condition on every step:

$$ loss_i = \left( \frac{\partial^2 \hat{x}(t_i)}{\partial t^2} + 14 \frac{\partial \hat{x}(t_i)}{\partial t} + 49\hat{x}(t_i) \right)^2 + \left( \frac{\partial \hat{x}(t=0)}{\partial t}\rvert_{t=0} + 3 \right)^2 + \left( \hat{x}(t=0)\rvert_{t=0} \right)^2 $$

The results are not perfect but better. I have not managed to get the loss almost zero. Deep networks have not worked at all, only shallow one with sigmoid and lots of epochs.

Highlight:

I am surprised this works at all since the cost function depends on derivatives of non-trainable parameters. This is interesting to me. I would love to hear some insight.

I would appreciate any input on improving the solution. I have seen a lot of fancy methods but this is the most straight forward. For example, in the referenced paper above - the author uses a trial solution. I do not understand how that works at all.

Results:

Method A = method described above
Method B = method described in the accepted answer
Shallow = One layer, 1024 nodes, gaussian activation with $b=2$
Deep = Three layer, 10 nodes each, sigmoid activation in all

The transform method B appears to work better. The other method A, may come in handy as a control method or when boundaries are very difficult to model with a trial function, or when not solving on a rectangular domain.

I think both methods can be improved with better domain sampling instead of random shuffle, for example different sampling for the boundaries and different sampling for points inside the domain where the points are collocated.

When the target is not all zeros, predicting all zeros is strongly suggestive of a programming error. We have some suggestions on how to go about debugging a neural network in the linked thread; in particular, I recommend constructing unit tests for each method of your class. As a starting point, I suggest beginning with a dirt-simple ODE/PDE and building up complexity from there. — Sycorax, Oct 25 '20 at 16:49
@Sycorax Can you please consider reopening. This is not a duplicate - you can see the loss function I added. This is a specific question about duplicating the results of the video. — Edv Beq, Oct 25 '20 at 17:01
Even if it's not a duplicate, it's not on-topic per the [help] because this question is primarily concerned with debugging code. In particular, the loss going to zero but the model making poor & constant predictions is strongly suggestive of a bug in your code. If you could edit your post to focus on an on-topic statistical question, not debugging code, I will re-open. — Sycorax, Oct 25 '20 at 17:11
@Sycorax I removed all the code. Thank you. I only added the code to support the interpretation. — Edv Beq, Oct 25 '20 at 17:15
I'm impressed by the improvement that you achieved using this new loss function. It looks like you're on the right track. In terms of getting the loss to zero, you've noted that *deeper* networks are not effective. Have you explored using *wider* networks? For instance, what happens if you double the number of hidden units? — Sycorax, Oct 28 '20 at 15:45
@Sycorax I did that and was able to improve but up to a certain point. Not much roi after a certain number. — Edv Beq, Oct 28 '20 at 15:52
Taking a quick look at the paper the method presented there seems different than what you are doing. In the paper they make an ansatz that explicitely fulfills the initial conditions. In your case this would be $\Psi(t)=-3t+t^2\hat{x}(t)$, where $\hat{x}(t)$ is the neural net. The cost function you need to minimize then is $\sum_i(\Psi''(t_i)+14\Psi'(t_i)+49\Psi(t_i))^2$. — sebhofer, Oct 29 '20 at 10:08
@sebhofer Thank you. I still have trouble with this. I do not understand how to come up with that form of the equation. Also, is there something after the minus sign in your cost function? I was following the video - but in the video he references the paper. — Edv Beq, Oct 29 '20 at 14:59
@sebhofer Explaining how the paper constructs their function - would qualify as improving the solution so you can get the point. You can post it as an answer. Thank you/ — Edv Beq, Oct 29 '20 at 15:11
ohh i see - because when you take the derivative - you want a $t$ left there to get rid of that part of the equation — Edv Beq, Oct 29 '20 at 19:57
I added an answer. Let me know if there's still something unclear. I would be interested to see plots of your final solution, pls post some! — sebhofer, Nov 02 '20 at 11:24
@EdvBeq Could you please provide the working code? I have been trying to reproduce it but I failed. — Matheus Manzatto, Dec 23 '20 at 15:57
@MatheusManzatto Sorry for the late reply - I have the inverse version of this (meaning i'm also solving for the parameters) but you can bring it back to the above. Here is my github: https://github.com/uninstallit/CS584/tree/master/Final_Project_Beqari_and_Williamson/src — Edv Beq, Jan 14 '21 at 00:35

score 3 · Accepted Answer · answered Nov 02 '20 at 11:22

The procedure presented in the paper seems to be slightly different from the one above. In the paper the authors make an ansatz that explicitely fulfills the initial conditions. For a second order differential equation of the form $$ \Psi''(t)=f(t,\Psi(t),\Psi'(t)) $$ with $\Psi(0)=A$ and $\Psi'(0)=B$ they suggest to use (see section 3.1 and specifically equation (13) in the preprint) $$\Psi(t)=A+Bt+t^2N(t),$$ where $N(t)$ is the neural net. Note that this form is not unique, but it will have the correct initial values no matter what $N(0)$. The cost function to optimize on the other hand is $$ C=\sum_i(\Psi''(t_i)-f(t_i,\Psi(t_i),\Psi'(t_i)))^2, $$ where $\{t_i\}_i$ is a set of collocation points that are sampled from the domain of $\Psi$. So for your example problem you have $A=0$, $B=-3$, and $C=\sum_i(\Psi''(t_i)+14\Psi'(t_i)+49\Psi(t_i))^2$.

@EdvBeq It seems that you have additional questions about pde/ode models. This is great, but the best way to ask a question is to use the Ask Question button, not write a comment. — Sycorax, Nov 02 '20 at 17:48

On solving ode/pde with Neural Networks

1 Answers1