I used a binary logit model with a lasso regularization term to predict an unbalanced dataset, where I used undersampling on the minority class (2% of observations) to get a 50/50 split of the classes.
Now I want to estimate the model coefficients, but get mostly statistically insignificant coefficient estimates when using the whole (unbalanced) dataset. After downsampling, the estimated coefficients become statistically significant and make sense considering past literature on this topic.
Is it a valid approach to downsample in order to get the coefficient estimates, or will this bias the coefficients somehow? The downsampled dataset ends up with about 50.000 records.
I have read about choice-based sampling, but I can't seem to figure out whether it applies to my problem.
Thanks