Here is a good if partial explanation of what AIC is here. From that, we can see that the residuals are so frequently assumed to be Gaussian, that one often forgets that that is not necessarily the case. When they are Gaussian the only variable term is the loss (or fit) term for optimization of log-likelihood is $-\frac{1}{2}(x-\mu)^T K^{-1} (x-\mu)$.
This results in several problems: 1) Gaussian loss is optimistic, and not a given. 2) In general loss (or reward) functions can have local minima, be indeterminate, and 3) be totally inappropriate to the physics of "loss" of the problems they model. It is more appropriate to model the loss function explicitly rather than just assume a Gaussian. For example, counting of nuclear decay statistics is Poisson, not Gaussian.
With respect to 1) the Gaussian assumption, there have been attempts to generalize AIC for mixture distributions. Indeed, there has been an attempt to generalize AIC to be distribution free for generalized estimating equations GEE. Indeed, if you want to use some other distribution, go ahead... and do so.
To go ahead and use AIC, AICc and BIC while making specific Gaussian assumptions concerning likelihood that can sometimes be irrelevant to the physical problems under consideration. For example, let us consider AIC in the context of Tikhonov regularization. There are many criteria that can be applied to selecting smoothing factors for Tikhonov regularization. To use AIC in that context, there is a paper that makes rather specific assumptions as to how to perform that regularization, Information complexity-based regularization parameter selection for solution of ill conditioned inverse problems. In specific, this assumes
"In a statistical framework, ...choosing the value of the regularization parameter α, and by using the maximum penalized likelihood (MPL) method....If we consider uncorrelated Gaussian noise with variance $\sigma ^2$ and use the penalty $p(x) =$ some norm, the MPL solution is the same as the Tikhonov (1963) regularized solution."
The question then becomes, why should anyone make those assumptions? For the problems I deal with, the residuals are too poorly behaved to make a Gaussian assumption. Even if one were to generalize the residuals, there is no known solution so that a penalty function cannot be specified, and a goodness of fit criterion will not allow for specifying the parameters that one wishes to extract from the modelling process, especially since goodness of fit and AIC pertain to the data in a sense of leave one out (LOO), but not extrapolation to infinite time from within an incomplete range data, as would occur in a time series, for which we do not have infinite observation times.
An inverse solution which optimizes AUC to infinite time can be found using Tikhonov regularization to minimize relative error of whichever parameter one desires to measure for a proportional error measurement system. See Tikhonov adaptively regularized gamma variate fitting to assess plasma clearance of inert renal markers. Inverse problems are not generally curve fitting, but can be used for minimization of propagated error, although admittedly they rarely are because people often try to minimize arbitrary criteria without thinking through what they are trying to do. Thus, a proper inverse problem solution, one that has a stated goal related to the physical quantity being optimized may allow identification of what likelihood assumptions should have been made to begin with, and in general one is better off not making that type of assumption as a prior but deriving it as a post. For example, read this.
So, yes one can "use" AIC with Tikhonov regularization, but even if one does that correctly, it will not quantify a time series properly.
Here is a link that features some regression types, about a dozen or so out of many many others. AIC is only mentioned once and less frequently than BIC with the comment "They (sic, AIC and BIC) also tend to break when the problem is badly conditioned (more features than samples)." To use AIC we are still relegated to using MLR, and that is a severe limitation. One set of exact conditions for OLS that is unrelated to maximum likelihood includes having no x-axis data uncertainty as applies only for special circumstances, like when exact equal interval time series are plotted, as well as homoscedasticity, which requirements are often totally ignored resulting in biased regression. That is, the intersection of OLS and MLR is small and in the linear case..
More time should be spent on the conditions that pertain to the physics of the problem being considered, which is often a sore point of contention between statistical procedures and the need to balance physical units or perform goal oriented regression than to blindly apply procedures. Even those who study AIC intensely will admit that sometimes we stand to gain less useful information from it than we would from ANOVA with partial probabilities of parameters.
There is one paper that suggests that AIC relates to nested models, and there appears to be no agreement with that opinion, e.g., see Hyndman point #4.
Another dimension to this problem is that AIC really is not reliable for model selection, even in the restricted conditions in which it is touted to be useful. The only criteria that are generally portable between models of anything are the Pearson Chi-squared probabilities and for density models, Cramér-von Mises probabilities. It is a Pyrrhic victory to have a better AIC index for a fit that has a p < 0.0001 of being correct. It would appear, for example, that the prior ranking of models for finding distributions in Mathematica 10.3 a combined score from maximum likelihood was used. This now appears to have been changed to using BIC in their recent v11 release.
Finally, to use AIC we should
1) Determine if MLR is applicable to the problem type; example, for extrapolation MLR is not the best regression to use.
2) Assume ND residuals, then test MLR residuals for ND. If yes, continue, if no change MLR residual model, and retest.
3) Calculate AIC when MLR residuals agree with MLR residual assumption.
However, other procedures should still be explored, especially since a better AIC value does not tell us if a model is useful.