MLPRegressor with 'lbfgs' algorithm incredibly bad performance

Question

I can't manage to train the MLPRegressor with 'lbfgs' algorithm to better R2 score than around -14. How comes? First I tried randomly guess the hidden layers shape, then I even tried to use Grid Search CV, but doesn't help much. How can I train it better? How can I know the model is trained to best accuracy?

How comes that a simple linear regression would have at least 40% score R2 while the above has -14%?

from sklearn.neural_network import MLPRegressor
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.metrics import r2_score
from sklearn.model_selection import GridSearchCV
from sklearn.metrics import mean_squared_error
import math
import numpy as np
from scipy.stats import uniform


X = [
    [390973, 262345, 324807],
    [188322, 120766, 174883],
    [185967, 173290, 175605],
    [179309, 117915, 169950],
    [166298, 40042, 153851],
]

X_test = [
    [164077, 73041, 147249],
    [152734, 52099, 77967],
]

y = [
    8080000,
    1940000,
    3300000,
    1970000,
    624000,
]

y_test = [
    1580000,
    118000,
]

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_test_scaled = scaler.fit_transform(X_test)

# https://stackoverflow.com/questions/52032019/sklearn-mlp-classifier-hidden-layers-optimization-randomizedsearchcv
class RandIntMatrix(object):
    def __init__(self, low, high, shape=(1)):
        self.low = low
        self.high = high
        self.shape = shape

    def rvs(self, random_state=None):
        np.random.seed(random_state)
        return np.random.randint(self.low, self.high, self.shape)


parameters = {
    'hidden_layer_sizes': RandIntMatrix(1, 50, (500, 2)).rvs().tolist(),
    'solver': ['lbfgs'],
}

grid = GridSearchCV(MLPRegressor(random_state=1, max_iter=500), parameters, cv=2)
grid.fit(X_scaled, y)
print(grid.score(X_test_scaled, y_test))
print(grid.best_params_)

R2 score and Best params output: (for shape 2)

-14.423559240364366
{'hidden_layer_sizes': [11, 24], 'solver': 'lbfgs'}

R2 score and Best params output: (for shape 3)

-14.024830648388866
{'hidden_layer_sizes': [7, 19, 49], 'solver': 'lbfgs'}

Lots of suggestions in https://stats.stackexchange.com/questions/352036/what-should-i-do-when-my-neural-network-doesnt-learn. Once you've tried them all, come back and edit your post to clarify what's giving you trouble. — Sycorax, Jan 28 '21 at 22:23
@Sycorax the posted link is maybe good with general advices but I posted specific example that allows to submit from someone wiser specific answers and tips. Even I am now able to say some little details that would be able to relieve the problem a bit. So SO is often quite not helpful site, full of down voting or thread closing people, which is not helpful at all IMHO. — luky, Jan 30 '21 at 12:10
@Sycorax and I even quoted the specific problem in the title if you see, which is not similar to quoted link at all. Anyways doesn't matter, I am not owner of SO so I don't care how it works here. — luky, Jan 30 '21 at 12:11
I see no evidence in this question that you’ve tried any of the suggestions in the duplicate aside from scaling the data. As written in the duplicate, neural networks require some experimentation. If you don’t want to do the work, use a different model. In particular, large initializations seems likely to saturate, which is also addressed in the duplicate. — Sycorax, Jan 30 '21 at 14:09
Weights are initialized as integers (??) between 1 and 50. Try using one of the modern initializers, which choose **floats** in a range of values to tend to be stable in forward and backward passes. — Sycorax, Jan 30 '21 at 15:12
@Sycorax aha, that's not weights but number of neurons per hidden layer, see hidden_layer_sizes in the https://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html — luky, Jan 30 '21 at 16:06
Oh, I see. Since you’re asking about why lbfgs can’t find a good solution — how are you sure that this is an lbfgs problem? Can another optimizer do better? How do you know you're choosing a good learning rate, a good number of iterations, a good hidden layer size, etc.? The MLP default is RELU, which has the zero matrix as its hessian almost everywhere; a quasi-Newtonian algorithm like lbfgs would seem to struggle in that case. — Sycorax, Jan 30 '21 at 16:18
Here's a way to understand optimizer configuration. Give the network 0 hidden layers and no activation functions. This special network coincides with linear regression. Then try to fit it. Unless you're extremely lucky, it's unlikely that the default optimization configuration will converge to the same result returned by using a standard OLS estimation method -- you might have to adjust the learning rate, or number of iterations, or convergence criteria. — Sycorax, Jan 30 '21 at 16:41
@Sycorax well i tried first the Adam etc, but btw this one seems to learn much faster, and someone even said that this one is better for smaller dataset for training. When I provided bit more testing data than 4 it reached the R2 score like 82% (but btw when i tried it on testing data there was difference at one row like in two grades 1M instead of 10k Y number) so I chose another settings with R2 score like 60% rather). Anyway is interesting why it didn't fit properly even on the small data size. One problem was i called scaler.fit_transform() on testing data and not transform() btw. — luky, Jan 30 '21 at 19:11
@Sycorax hmm I tried also other activation functions than relu with Grid Search CV, but I think relu was always chosen as the best one. But I can try what you say. Btw Grid search tries all combinations? Of all parameters? — luky, Jan 30 '21 at 19:13
@Sycorax I ended with these parameters for Grid search, I found these parameters on various examples and put them together they seems to be quite important. The random state is very important too, it is the random weights initialization. https://pastebin.com/8YNasFq4 — luky, Jan 30 '21 at 19:18
Why are you only training the model for the default number of iterations? With other default values? NNs require a lot of experimentation to work. With 5 observations and a small number of iterations, it makes sense that the behavior would tend to be strongly dominated by the randomized intializations. — Sycorax, Jan 30 '21 at 23:08
@Sycorax I added more data already, like 40. Well regarding the lbfgs algo there is not much parameters to tune, if you check the sklearn mlpregressor page, you will notice that for this algo there are actually only params like alpha, hidden layers, and random state to tune. btw the data itself are bit chaotic, so maybe i can't expect high accurancy from the algo. btw if i try to use adam or other, it always say that it didn't converge, but lbfgs doesn't say this so i assume it trains better. on other side it doesn't use more iterations. — luky, Jan 31 '21 at 10:13
@Sycorax btw here is the performance report i get (for MLP and i tried also Linear regression) https://pastebin.com/Kjwt1Zvk — luky, Jan 31 '21 at 10:17
If you collect your experiments together and organize them as a coherent explanation of what you've tried and what you've learned, this question would be eligible for reopening. These comments show that you've tried the common-sense stuff, so a revised and expanded version of this question would be distinct form the duplicate. — Sycorax, Jan 31 '21 at 15:50

MLPRegressor with 'lbfgs' algorithm incredibly bad performance

0 Answers0