I have 20 response variables $Y = (Y_1, \dots, Y_{20})$, and 1600 predictor variables $X = (X_1, \dots, Y_{1600})$. There are 128 observations. I wanted to know which pairs of $X$ can best predict each of $Y$.
So I generated all the combinations of $(Y_i, X_j, X_k)$ and did linear regressions for each combination to find R-squared. Based on R-squared, I extracted top 100 combinations to further analyses which pairs of $X$ are the best predictors for $Y$.
I haven't consider multicollinearity between any pair of predictors. Should I consider multicollinearity?
My goal is to find the best pairs of $X_j,$ $X_k$ that can predict a $Y_k$. Can you give some suggestions to further improve this procedure to make it statistically valid ?