My apologies if this is grossly misunderstood or mis-worded, but I've been mildly bugged by a question to which I've not found a satisfactory answer. I can't say that I have seen a discussion about this in my discipline, which is why I feel like I've been searching in the dark here.
In sum, I was recently statistically evaluating different model fits using information criteria. That is, I had a dataset and I would fit models with different additive terms. I typically see model selection on linear models where the model terms are linear functions of predictor variables $X_i$, such as the $\beta _iX_i$ terms in the example $\hat{Y} = \beta _0 + \beta _1X_{1} + \beta _2X_{2} + \beta _3X_{3}$.
Using information criteria one can evaluate the elegance of model—how much information is explained relative to its complexity. From what I've seen, complexity is increased with each instance of a free parameter term (e.g., $\beta _i$ in the model above).
As I was beginning to include nonlinear terms I began to wonder about the measure complexity as an instance of an additive term with a free parameter: But what about the complexity of the nonlinear terms? From what I understand a linear function is measured to be as complex as a highly nonlinear function, so long as it has the same number of free parameters. For example
$$\hat{Y} = \beta _0 + \beta _1X_{1} + \beta _2X_{2} + \beta _3X_{3}$$
has the same model complexity as
$$\hat{Y} = \beta _0 + \beta _1X_{1}^2 + \frac{\sin\left(\beta _2X_{2}\right)}{1 + X_{2}^3} + \beta _3\log \left(X_{3}^{-1}\right).$$
I understand these two models to be the same in terms of the complexity of statistical fitting, but the second is much more mathematically complex.
I haven't fully reasoned why the latter model is more complex and why, if both explain the same amount of data, I'd argue that the former model is more elegant and parsimonious than the second. My gut and brain and telling me that the second is more complex, but I just need some guidance as to why that may or may not be the case.