I've noticed a lot of medical research that I am involved in goes as follows:
Collect data on 300-1000 patients, including all sorts of baseline characteristics such as BMI, age, gender and then outcome related statistics, so say our outcome is "fracture after operation", we could have angle of fracture, fracture density, pain scores, mobility scores, quality of life scores etc. etc. and then finally our outcome, whether or not the patient had a fracture after the operation. Often these outcomes are binary and the goal is to see if any of the independent variables are associated with fractures.
Now the problem here is we have a binary outcome variable and we often end up with about 30-50 patients who actually had a fracture out of 1000 patients, so the statistics are quite skewed and a lot less powerful than if 500 of the patients had fractures.
The 2nd problem is we have maybe 50 independent variables of diverse types, factors, continuous, binary (am I correct to assume that in these cases p>N due to the outcome variable only encompassing, say 30 patients, even though the study size N is 1000?)
The 3rd problem is these are often studies made with little previous knowledge on the subject, so it's often hard to manually pick confounders by expert opinion.
Obviously we can't run a large multiple regression with all variables as the model overfits. We can't run 50 (independent variables) multiple regression analyses controlling for say age and gender, because we quickly run into a very grim multiple comparison problem.
We can't use regularization models because we are interested in all 50 variables and whether they are associated with our outcome (none are deemed simply controls, which regularization models choose from but do not necessarily add to the model).
From a statistical viewpoint, what would be your way of handling such a study design? Currently I just run logistic regression models controlling for patient characteristics and am transparent with the fact that the p-values are unadjusted.
I should note that these studies are not meant to invent a new method of treatment or change protocols, they are used to see what variables are of interest for future research.