Model comparison (linear vs. non-linear) and when the criterion is not the number of parameters

Question

Imagine we have 10 input features/predictors, large sample size, and the following two scenarios:

Scenario 1: label/dependent variable is bionmial (binary classification problem).
Scenario 2: dependent variable being continuous (regression).

Now, the evaluation method for choosing best model is NOT about a model fit with least number of parameters that have comparable performance. So LRT, AIC, etc. aren't applicable here.

Instead, let's say what we're interested in is:

Is there any evidence for pairwise or higher level/complex interactions between features?
In other words, do we gain anything from going more complex, and if so, what percentage of performance is because of the added complexity.

To address this, for each scenario, one would fit a simple multivariate logistic/linear regression vs. a hierarchical model or even something like a neural net (interpretation is not important here, black-box is fine).

What is a suitable approach to compare these, and say, there is a difference, no matter how small, but it's there and we have some level of certainty about it (for example with some p value and the effect size due to inclusion of complex interactions).

UPDATE 1:

Assume we perform cross validation, etc. to avoid over-fitting.
Also, consider the problem being people believe that univariate effect of features dominate the prediction, and now I'm claiming that is NOT true? Or at least it's not that simple.
The actual data is much bigger in term of number of features.

All I'm saying is that, I want to compare a simple linear model with a complex non-linear model, and show that there is a difference and so there are interactions, etc. Finding those interactions is another question, for later.

If we have prediction performance of say $X$ (some performance evaluation metric) with the simple model, and performance of Y with the more complex model. Is it reasonable to claim $(Y-X)/Y$ of performance comes from the complex interaction between features? Since the more complex model includes the simpler one. Like a logistic regression vs neural net. If so, what is a correct way of quantifying that with uncertainty.

UPDATE 2:

Label Permutation: Based on @Kodiologist's answer that the complex model will always be better. What if I form this as a permutation test (permutation of labels)? Then this would make a good null distribution, because complex one is always better in that setting as well, so now I can compare the difference in performance from actual/real data with the distribution of differences generated by label permutations.

I don't understand what the two "scenarios" have to do with the questions you're trying to answer. Either of those questions could be answered within either scenario. — The Laconic, Oct 21 '17 at 20:39
That's right. Just putting it there if one's answer is some way depends on one of scenarios (which I don't think would happen), for any reason, she/he should will mention that. — NULL, Oct 21 '17 at 20:46
I don't understand the proposed permutation test, because I don't see what labels you would be permuting. But if you agree with my argument that the null hypothesis is false, there's no reason to do a significance test, since the most a significance test can do is confirm what you already know. — Kodiologist, Oct 21 '17 at 23:04
@Kodiologist Right. About the permutation, I meant, since we're always guaranteed to get a better result, let's get the distribution of performance improvement (under null) by permuting the labels, and then this way we can get a pvalue. I don't think it's necessary but the field (biology) loves pvlaues. — NULL, Oct 27 '17 at 14:39

score 2 · Accepted Answer · answered Oct 21 '17 at 19:15

You might have been misled about what likelihood-ratio tests and AIC do. To be clear, neither always selects the model with the least parameters. A likelihood-ratio test is a significance test of the null hypothesis that a model provides no improvement in population likelihood over a certain prespecified special case of itself. AIC is a way to select models that trades off the number of parameters with the sample likelihood (the smaller the number of parameters and the greater the likelihood, the better).

It's important to think about what kind of "difference" you're looking for. Presumably you're thinking of likelihood or some other measure of model fit. In the sample, you're essentially guaranteed to find that the fit of the two models is different, and this can be verified just by computing the fits. You're probably interested in the population values of fit. Even then, there's essentially guaranteed to be a difference, because null hypotheses are always false. What's more, in both the sample and the population, you can expect, in advance of collecting any data, the more complex of the two models to have greater fit. If the models are nested, it's literally impossible for the simpler model to fit better than the more complex model.

Beware that model quality depends on one's purpose, so the best-fitting model may not be the best model for your purposes, that is, the model you want. For example, more complex models fit better, but may overfit, which reduces predictive accuracy. This is one reason we have fancy model-selection methods such as AIC and cross-validation.

There is also a variable/model selection criteria based on qualitative interactions. This method does not always select variables that are the "best" predictors. See the work of Gunter and Murphy (2011). For example Gunter, Zhu and Murphy "Variable selection for qualitative interactions" Statistical Methodology 8, pp42-55 — Michael R. Chernick, Oct 21 '17 at 19:47
@MichaelChernick thanks for the reference. Let's say I just want to know if there is any interaction, rather than finding interacting features. What would be a good way to compare these two different (linear vs non-linear) models and show that my claim is right. — NULL, Oct 21 '17 at 19:55
What about label permutation? please take a look at the updated question — NULL, Oct 21 '17 at 20:28

Model comparison (linear vs. non-linear) and when the criterion is not the number of parameters

1 Answers1