2

I've got a small data set of 55 observations with a binary outcome variable of which only 11 are 1's and the rest are 0's.

I was wondering if Lasso was a useful tool to predict my outcome here and if not I thought I'd still learn a thing or two.

I can get the model to run and display coefficients and p values typing:

dslogit outcomeY x1 x2 x3 x4...xn, controls(c1, c2, c3...cn)

It actually looks great, the p values are much better than I get with my highly unstable multiple regression (I realize multiple regression is not a great idea with such a little dataset) and knowing that when something's too good to be true, it usually is; I ask you: What's my mistake here and what should I be looking out for before I go tell everyone about my magnificent results?

Paze
  • 1,751
  • 7
  • 21
  • 3
    (1) How did you calculate the p-values? - what null hypotheses are relevant to you? (2) What do you want from the model in any case? – Scortchi - Reinstate Monica Jan 09 '20 at 01:17
  • how many predictor variables do you have? – knrumsey Jan 09 '20 at 04:32
  • I have no idea how I calculated the p values. Like I wrote, I'm getting started. The dslogit outputs p values and odds ratios! I have about 10 predictors. The null hypothesis is that nothing can predict the outcome so it's a bit of a far fetch. – Paze Jan 09 '20 at 09:29
  • 1
    Correct calculation & interpretation of p-values for LASSO, or for penalized regression in general, is a thorny problem. [Here](https://stats.stackexchange.com/q/410173/17230) might be a good place to start reading, as well as, of course, the manual for `dslogit`. What I was getting at with my 2nd question is that there might not be any reason to care about p-values if the point of choosing LASSO was to improve out-of-sample predictive performance - & if it was, you ought to be estimating that. – Scortchi - Reinstate Monica Jan 09 '20 at 16:34

0 Answers0