How to consider wage data in a logistic regression model when the values are bucketed?

Question

If I have data such as

wage :
       under 10K
       10K - 20K
       20K - 40K
       40K - 80K
       over 80K
       don't know
       rather not say

Which I wanted to use in a model, I don't think that I can use this as anything other than an ordinal variable? It feels natural to want to be able to talk about things in terms of wages increasing/decreasing, but as there are categories of don't know and rather not say it seems that this is forced.

Looking at this post seems to support my thinking, but I wanted to check, perhaps there is some basic literature that might be recommended that covers this.

Why would a model predict whether a person answers "don't know" or "rather not say"? I think the five ordinal categories can be used, and the other two can be simply treated as they are "missing values" (not bringing any information in itself). — Nuclear03020704, Aug 04 '20 at 05:54

score 2 · Answer 1 · answered Aug 09 '20 at 14:56

You seems to have what could be called Categorical Variables With Partial or Tentative Ordering of Categories, but I do not know about regression models directly for such data. You might consider the unordered categories as missing data, and then use (multiple) imputation.

But, this "missingness" is probably not random, so you should first step back and investigate the pattern of missingness, for example see if missingness can be predicted from your predictor variables. After using imputation you can then use an ordinal regression model.

How to consider wage data in a logistic regression model when the values are bucketed?

1 Answers1