Is overfitting an issue if all I care about is training error

Question

I am working on a project where we perform non-response adjustment by weighting survey respondents by their probability of response. In order to do this, we need to estimate each respondents probability of response using a model (typically a logistic regression). Essentially, after receiving all of our survey responses we have a subset who responded (1s) and a subset that didn't (0s), the goal of the model is to take this information and change the 1s and 0s into probabilities.

My colleague was describing the current method of creating the model where they use step-wise regression to select the model without any sort of cross-validation or holdout procedure involved. I was going to mention how step-wise methods are generally frowned upon and they are data dredging, but then I thought, maybe it doesn't matter? If all I want to do is estimate the probabilities of my training data (I don't care to use this model for future data and I don't care about analyzing the coefficients) does it matter if I am over-fitting the training data? Further, maybe that is actually my goal? Maybe I actually want to over-fit?

If all you're after are probabilities then you're already done. You have 1s and 0s, and those are probabilities. People who responded definitely responded and people who didn't definitely didn't. That's your answer, if you really don't care beyond having probabilities. — abstrusiosity, Nov 17 '20 at 19:58
I think you're missing the point. In a survey, we weight respondents by the inverse of their sampling probability to account for the fact that we have a sample when we do our estimates. Well, we are doing the same thing here but weighting for non-response. We weight by the inverse of the probability of response. We don't do it based on whether they actually did respond as you suggest (same way we don't weight based on whether they were actually surveyed) but rather based on what the probability was that they would have responded based on their characteristics which is what I am after. — astel, Nov 17 '20 at 20:31
Your last sentence clarifies things a bit. My previous (and snide) answer was motivated by the fact that overfitting the data would lead to probabilities of 0 and 1. Your comment says that you do (implicitly, at least) care about the interpretation of the coefficients since they tell you how to go from characteristics to response probability. That mapping would be harmed by overfitting. — abstrusiosity, Nov 17 '20 at 20:40
What are you doing where you don't care about the ability to generalize? — Dave, Nov 18 '20 at 16:25
@abstr: That look like an answer, can you write it as a formal answer? so the Q does not linger on as unanswered! — kjetil b halvorsen, Nov 18 '20 at 16:37
I like this question: it seems like in order to cancel a bias (non response) you can get into a high variance trap (overfitted reweighting model), it's not like ordinary bias-variance trade-off though — carlo, Nov 18 '20 at 16:48
Consider replacing one of your tags (maybe `training-error`) with [`propensity-score`](https://stats.stackexchange.com/questions/tagged/propensity-scores) to get input from a highly qualified expert like @Noah [user 116195](https://stats.stackexchange.com/users/116195/noah). — EdM, Nov 18 '20 at 17:30

score 6 · Answer 1 · answered Nov 18 '20 at 16:49

6

With this type of propensity-score evaluation you can have less fear of overfitting, but you can take it too far. This paper, for example, concluded from simulation studies:

Overfitting of propensity score models should be avoided to obtain reliable estimates of treatment or exposure effects in individual studies.

If you are conducting a survey, you presumably want to apply the survey results to new cases, not just to describe the training set. Insofar as overfitting of the propensity-score model might make the survey results less applicable outside the training set, you need to take that into account.

answered Nov 18 '20 at 16:49

EdM

57,766
7
66
187

1

Interesting note from that paper: "In fact there is a wide-spread perception that the propensity score is meant to be only descriptive for the data in hand but not to be generalizable to other data sets". Which is pretty much the question I am asking. – astel Nov 18 '20 at 17:06
@astel which is why I thought it wise to show what can happen when you test that wide-spread perception. That said, the danger of moderate overfitting of propensities might not be too great. – EdM Nov 18 '20 at 17:11

Is overfitting an issue if all I care about is training error

1 Answers1