I am modelling a nonlinear stochastic process and have data to compare model output against. My aim is to obtain an evolution equation of the form,
$$\frac{du}{dt} = f(u,\theta_f)+\alpha(u,\theta_\alpha)\xi(t),$$
($f$: a nonlinear function, $\xi(t)$: Gaussian noise, $\alpha$: a linear function, $\theta_{i}$: parameters) that can satisfactorily reproduce the observed probability distribution of $u$. Therefore, I fit the parameters with an optimisation algorithm that minimizes the mean squared difference between histograms of steady state model output and data. Alternative models are chosen by taking different functional forms of $f$.
For selection between these alternative models, I use the $AIC$. Usually, the $AIC$ is calulated from the variable of interest $u$ itself (see 'Usual procedure'). I calculate the $AIC$ differently, from histograms of the data and model output of $u$ (see 'My approach').
Usual procedure:
In e.g. linear regression, model errors $\hat\sigma^2_\epsilon$ are assumed to be normally distributed, such that,
$$AIC=n \ln(\hat\sigma_\epsilon^2) + 2k,$$
where $\hat\sigma^2_\epsilon$ is obtained by
$$\hat\sigma_\epsilon^2=\frac{1}{n}\sum_{i=1}^{n}[\hat u_{i}(\theta)−u_{i}]^{2},$$
with $\hat u_{i}(\theta)$ model estimates, $u_{i}$ observations and $n$ the number of each.
My approach:
I calculate the $AIC$ based on the histograms of model output and observations, from the mean squared difference mentioned above, i.e.
$$\hat\sigma_\epsilon^2=\frac{1}{m}\sum_{j=1}^{m}\{ h[\hat u_j(\theta)]−h(u_{j})\}^{2}.$$
Here, $h$ is the normalised frequency in bins with bin center $u_j$, $j\in \{1,...,m\}$. Note that $k$ is the number of parameters, i.e. the number of elements in the set $\theta=\theta_f \cup \theta_\alpha$.
My question:
Is this alternative way of calculating the $AIC$ valid and is it justified for my problem? Are there better ways of doing this?