I am working on fitting an ODE model to some data. So I have a vector of time series data $\textbf{x} = [x_1, x_2, ... x_n]$, and an ODE model $\dot{x} = f(x, \theta)$, where $\theta$ is a vector of parameters. I have defined a simple squared loss function, and can include some additional regularizers, depending on convergence speed, etc. When I integrate the ODE model, I obtain a function $F(x, \theta)$ which ideally would resemble the data.
So the loss function looks something like:
$$ \mathcal{L}(x, \theta) = \sum_{i=1}^N (x_{i} - F(x_i, \theta))^2 $$
Now, I have a few different variations on the ODE model that could work, and I want to understand the right criteria to use for model selection. I come from statistics, so we generally use something like AIC or BIC to measure the goodness of fit discounted by the model complexity (meaning number of parameters). Of course AIC and BIC use a likelihood function instead of a simple loss function.
Hence I was just wondering what the equivalent criterion to AIC/BIC would be for fitting an ODE to some data. Can I just use AIC or BIC criterion but with the loss function instead of the likelihood function? Or are there other concerns that I might not have accounted for.
Any suggestions would be helpful.