One possibility that you didn't mention is to use ridge regression. That has a conceptual relationship to principal components regression (PCR), but instead of making an all-or-none choice of which principal components to include it weights the principal components differentially to avoid overfitting. The result is a separate penalized ridge regression coefficient for each of your original predictors. See page 79 of ESLII for details of this relationship between ridge and PCR. Correlated predictors are handled well by ridge regression as each set of such predictors tends to be contained in the same principal components. If your main interest is in prediction and all of your predictors will be readily available in the future, ridge has the advantage of not throwing away any potentially useful information.
LASSO represents the other extreme, selecting a subset of predictors while it penalizes their coefficients to avoid overfitting. From your data, with about 150 members of the smaller class, it might select about 10 predictors if you choose the penalty value that minimizes the cross-validation deviance (an appropriate measure of logistic regression quality unlike, say, accuracy). From among a set of correlated predictors it will tend to choose one or a few most strongly associated with outcome in your particular data set, so you will notice some instability in the set of predictors selected if you repeat LASSO on multiple bootstrap samples of your data. That doesn't necessarily pose a problem with respect to predictions, as choosing any of those correlated predictors might do about as well, but you should be aware that the predictors chosen aren't necessarily the "best" in any general sense.
Elastic net combines LASSO and ridge regression in a way that might work well with your data set.
The result of your idea of comparing 50 different models, each including only 1 of the 50 highly correlated predictors, then choosing the best one would tend to be highly dependent on your particular data set and the model would not directly incorporate the fact that you used the data to select the predictor. Thus your model would tend to overfit and might not work well on other data samples. The penalization on coefficient values imposed by LASSO, ridge, or elastic net provides a better choice.
Finally, a warning about any of these approaches when you have categorical predictors, not just continuous predictors. PCR, ridge, LASSO, etc typically normalize predictors at the start so that the original scale of measurement (e.g., miles versus millimeters) doesn't influence the result. But what is the best way to "normalize" a binary predictor variable, or a multi-level categorical variable? Your knowledge of the subject matter might need to come into play with respect to that issue. See this page and its links for further discussion.