I am using polynomial regression to predict mean occupancy in a hospital unit using average length of stay (LOS) and arrival rate to the unit. I am using different percentages of training sets to train the model and 5 fold cross validation to pick the best model. For each percent of training set, the model is trained 20 different times. The final models are then used to make predictions on the test set. However, when I obtain the results of the 20 runs, the "average LOS" coefficients are highly variable (in magnitude and opposite signs) from one model to the other. Is there a way to 'quantify' this coefficient variability? or is there a way to measure 'model stability'?

- 31
- 4
-
No point doing train/test/validation splits for polynomial regression. This is a "smooth" function (unlike regression trees) so you're better off using all the data to fit your model, and make use of model averaging techniques – probabilityislogic Jul 08 '18 at 02:21
2 Answers
Yes, there is. In a recent paper we present a framework for doing exactly what you describe.
The procedure is implemented in the R package stablelearner
which you can find here stablelearner package.
The basic idea is to use a resampling strategy that generates pairs of learning samples that do not overlap overly much and an evaluation set that does not overlap with the learning samples. Then the model in question is fitted to each of the learning samples and based on both estimated models a pair of predictions (one from estimated model 1, one form estimated model 2) are generated for the evaluation data. For the prediction pairs a similarity measure gets calculated and aggregated over all data in the evaluation set. This is repeated B times and a summary statistic or the whole "stability distribution" is used to gauge the stability. In the paper, a 0.9*Bootstrap resampling with out-of-bag evaluation is recommended.
Polynomial regression is supported by the package (in form of the linear model). Here's a toy example:
R> library(stablelearner)
R> data(iris)
R> model<-lm(Sepal.Length~Sepal.Width+I(Sepal.Width^2),data=iris)
R> set.seed(1)
R> stab<-stability(model,control=stab_control(sampler=bootstrap,v=0.9))
R> summary(stab)
Call:
lm(formula = Sepal.Length ~ Sepal.Width + I(Sepal.Width^2), data = iris)
Sampler:
B = 500
Resampling method = Bootstrap sampling with 90.0% data
Evaluation method = OOB
Sampling summary:
avg min max
Learning sample size 135.000 135 135
Learning overlap 53.080 38 68
Evaluation sample size 24.428 15 35
Evaluation overlap 0.000 0 0
Similarity summary:
, , Concordance correlation coefficient
5% 25% 50% 75% 95%
lm 0.0102 0.4355 0.6429 0.8169 0.9591
R> boxplot(stab)

- 8,839
- 3
- 46
- 59
I wanted to comment but don't have enough reputation to do so. Why would I be getting NA values for the Concordance Correlation Coefficient?
I tested it with two models:
m1 <- glm(count_formula, family="quasipoisson", data=cohort,
weights=weights)
where weights is just a column of 1's and
m2 <- glm(count_formula, family="quasipoisson", data=cohort,
weights=weights)
where weights are calculated propensity scores.
Both models return NA values for the CCC when I run
stab <- stability(m1, control=stab_control(sampler=bootstrap,v=0.9))
or
stab <- stability(m2,control=stab_control(sampler=bootstrap,v=0.9))

- 63,378
- 26
- 142
- 467

- 11
- 1