You're seeking a needle in a haystack. As noticed in the comment and another answer, you do not have enough data. Even if you had so many features, 65 is already a very small sample size for any machine learning model, so adding feature selection to it makes it a pretty doomed problem.
You say that you have between 5 and 10 thousand features, so I'd assume 7500 features. With 55 train samples, your model would easily overfit. Below you can see the model trained on completely random data that "achieves" nearly perfect $R^2$.
from sklearn.ensemble import AdaBoostRegressor
import numpy as np
np.random.seed(42)
y_train = np.random.rand(55)
X_train = np.random.rand(55, 7500)
model = AdaBoostRegressor(random_state=i)
model.fit(X_train, y_train)
model.score(X_train, y_train)
## 0.9895214625949762
You would probably say "Hey, wait a minute! This is a train score. What about test score?". You'd be right, the test score is bad. So let's say that you would train the model, validate the results on the test set and repeat until finding the acceptable results. Notice that your test set is only ten samples. So you would only need to get ten numbers right. Let me give another example. Your "model" returns a completely random result now, how many iterations are needed to obtain a result with a high $R^2$ (it's equal to $r^2$ for linear regression)? Apparently, you need just a few thousand iterations.
np.random.seed(42)
best_r2 = 0
y_test = np.random.rand(10)
for i in range(10000):
y_pred = np.random.rand(10)
r, _ = sp.pearsonr(y_pred, y_test)
r2 = r**2
if r2 > best_r2:
best_r2 = r2
print(f"iter={i}, r={r2}")
## iter=0, r=0.49601681572673695
## iter=6, r=0.6467516405878888
## iter=92, r=0.6910478084107202
## iter=458, r=0.6971821688682832
## iter=580, r=0.6988719722383485
## iter=1257, r=0.721148489188462
## iter=2015, r=0.7437673627048644
## iter=2253, r=0.7842495052355497
## iter=4579, r=0.8189207386492211
## iter=5465, r=0.8749525244481782
How does this apply to the machine learning scenario? Imagine that instead of a random "model" you have some other machine learning model, trained on the train dataset and validated on the test set. Say that you "tune" the random seed of the model for many iterations. If you wait long enough, you would find a completely random solution, on completely random data, that matches your test data well. The same applies to data-based feature selection.
You can find similar arguments in How to choose the training, cross-validation, and test set sizes for small sample-size data?
- If your sample size is already small I recommend avoiding any data driven optimization. Instead, restrict yourself to models where you
can fix hyperparameters by your knowledge about model and
application/data. This makes one of the validation/test levels
unnecessary, leaving more of your few cases for training of the
surrogate models in the remaining cross validation.
also, rather than using a single held-out test set, better to use cross-validation. Keep in mind that with a small sample cross-validation is still not very reliable (see Varoquaux, 2017) and does not offer good out-of-sample performance estimate.
TL;DR:
- The more iterations you make, the more likely you are to overfit. Avoid data-based optimization as much as possible. So rather than using data-based feature selection, pick (< 10) meaningful features by hand.
- Same applies to model choice, with such small data, you don't want to tune the hyperparameters. Use domain knowledge to pick a model that is likely to work for the data. You want a simple model that is less likely to overfit.
- Use cross-validation rather than held-out set. The test set consisting of ten samples is too small and unreliable. You could easily overfit to it.