I have a large $(10000 \times 5001)$ table representing $10000$ samples and $5001$ different features of these samples. One of these features represents an output variable of each sample. In other words, I have $5000$ input variables and one output variable for each sample.
I know that most of these inputs are irrelevant. Therefore, what I would like to do is determine the subset of input variables that predicts the output variable best. What is the best/simplest way to go about doing this in R?