Testing the variance part of a Generalized Linear Model out of sample?

Question

Suppose I have a response vector and a factorial design (for simplicity, assume it’s a one-way ANOVA with two treatments). A few Generalized Linear Models (Poisson, Negative Binomial, etc) are fitted to the data. This is done separately for each of K experimental units. Each unit has a distinct response vector, but the design matrix is the same across units.

There are a few ways to decide what particular GLM (i.e. the distribution of response) to use for each unit. E.g. one can say that the choice is determined by AIC separately for each unit. Apparently, using such unit-specific GLM amounts to a higher number of effective parameters, so it may be the case that using the same GLM for all units works better overall.

The problem is that there is no treatment effect in any of the units, so I can’t construct a ROC curve because it requires having both true positives and true negatives, and the former are absent here. What I can and will do is to check whether the test size is preserved (i.e. a “good” strategy should result in calling about 5% of the units when testing for the treatment effect with the cutoff p-value = 0.05).

I am wondering whether it’s possible to perform a more direct, out-of-sample test. Note that the value of predicted response is equal to the corresponding cell mean regardless of what GLM is used. Therefore, I will have to compare not the accuracy of predicting the future response, but the accuracy of predicting the variance of response or some other statistic.

E.g. suppose for each unit I have 20 observations per treatment. I use 10 observations per treatment to fit a few GLMs. Each GLM produces an estimate of response variance in each cell. Using the other half of the sample, I compute the observed variance in each cell and the discrepancy between the observed and predicted values. The discrepancy measures are summed up across all of the units.

If you have seen something like that before, please provide suggestions and references.

Why do you need to test variance specifically, rather than using one of the enormous number of model selection techniques that already exist? You're right that ROC curves apply only to binary outcomes, but there is a whole world of model selection beyond ROCs. — shadowtalker, Aug 22 '14 at 02:52
You could also discretize your outcome variable and compute an ROC curve (or its multinomial generalization) for the transformed outcome. — shadowtalker, Aug 22 '14 at 13:16
I explained why I can't construct ROC in the question, didn't I? — James, Aug 22 '14 at 14:01
And I don't understand your explanation. That paragraph is very unclear... what do you mean by "there is no effect size" and "whether the test size is correct"? — shadowtalker, Aug 22 '14 at 15:01
I said nothing about "effect size". I said there is no treatment effect. — James, Aug 22 '14 at 17:21
Sorry, I had both phrases "test size" and "treatment effect" in my brain and typed a hybrid of them. Did you mean a treatment effect in the way they mean it here: http://www.ncbi.nlm.nih.gov/pubmed/16220481 ? — shadowtalker, Aug 22 '14 at 17:47
There is ANOVA with 2 treatments but I know in advance that there is no treatment effect, i.e. all my tests are true negatives. For ROC, I need true positives as well, but there are none. — James, Aug 22 '14 at 19:50
I still don't know what "There is ANOVA" is supposed to mean. Is that your design (in which case I don't think you understand what ANOVA is), or is that your analysis goal (in which case it might be the wrong one)? Can you edit the post to describe the setup and your analysis goals in more detail? Maybe even mock-up a small spreadsheet using the `code syntax` just as a visual representation of your data. It's often easy to generalize from a detailed case, but if your description is vague from the outset then your answers will be at least as vague. — shadowtalker, Aug 22 '14 at 20:37

score 3 · Accepted Answer · answered Aug 22 '14 at 04:16

If I understand your question correctly, you're just looking for way to select between GLM models in a way that doesn't depend on having different predictions.

If that's the case, you're out of luck in the way of loss-function-like criteria like $R^2$ and MSE (and its various analogues). However you have many options besides.

The most classical solution would be a likelihood ratio (LR) test. Since you're fitting several GLMs with the same linear component, comparing AICs or BICs would be redundant since they would reduce to LR tests anyway.

You could also directly conduct tests of equality of variance, with either Levene's test or the very similar Brown-Forsythe test, although my personal experience with both starts and stops in a classroom several years ago.

A third (and in my opinion superior) approach would be to simulate the marginal distribution of your outcome (by plugging your inputs and ML parameters into the likelihood, then repeatedly sampling) and compare it to the empirical distribution. This is a standard approach in Bayesian statistics, where generating such distributions is natural and classical goodness-of-fit tests are unavailable. One Bayesian term for this is "posterior predictive checking," and in the MLE case the posterior happens to be equal to the likelihood.

If your response variable is continuous, obvious tests in this case would be the two-sample versions of the Kolmogorov-Smirnov and Cramér–von Mises tests. If your response variable is discrete, you could use Pearson's Chi-square test. The Chi-square test would also work on binned continuous data. I'm a big fan of binning in cases when observations are clumped, or sparse, or there in cases where it would "wash out" problems in the data. Several other possibilities, based specifically on Bayesian approaches, are listed here.

A note about the "CPO" listed in that last link: leave-one-out cross-validation is expensive. A cheaper alternative would be to just compare the averages of your posterior densities, i.e. $\frac{1}{N}\sum_i{(\ln{f{(y_i|\hat{\theta}_{g}^{ML})}})}$ and $\frac{1}{N}\sum_i{(\ln{g{(y_i|\hat{\theta}_{f}^{ML})}})}$.

(Note that in Bayesian statistics, it is additionally possible (and desired) to marginalize over the distribution of parameters as well, and this advantage is lost in the MLE use case. But given that MLE makes distributional assumptions on the response variable (and MLE is just a special case of Bayesian estimation anyway), in my mind the most practical way to check those assumptions is to do so directly by checking their implications.)

LR test works only for nested models. As I said, the design is fixed. This implies I can use LR to test Negative Binomial vs Poisson because the latter is nested in the former, but if I want to test a Normal response it won't work. — James, Aug 22 '14 at 14:06
For Levene and similar tests I have to know the type of response distribution (Normal, Poisson, etc), so it amounts to circular reasoning. — James, Aug 25 '14 at 15:09
@James I don't see that anywhere in the formula and I don't think that's true. From http://www.itl.nist.gov/div898/handbook/eda/section3/eda35a.htm : "Although the optimal choice [between mean and median] depends on the underlying distribution, the definition based on the median is recommended as the choice that provides good robustness against many types of non-normal data while retaining good power. If you have knowledge of the underlying distribution of the data, this may indicate using one of the other choices." — shadowtalker, Aug 25 '14 at 15:45
Also I think you're missing the point of my answer which stresses simulation and sanity checks above all else. — shadowtalker, Aug 25 '14 at 15:47
I have a technical question about the simulation of distribution of response. The mean and possibly variance of the distribution is dependent on the covariate. Should one compare the empirical and simulated distribution separately for each covariate pattern or somehow aggregate the response over all of the patterns present in the dataset? — James, Aug 27 '14 at 14:09
Simulate for each observation (not just each unique level), and just concatenate the results. That should give you a coarse approximation of the marginal distribution, since it implicitly incorporates the empirical distribution of your covariates. — shadowtalker, Aug 27 '14 at 14:39

Testing the variance part of a Generalized Linear Model out of sample?

1 Answers1

Linked