Why does this code use MinMaxScaler to preprocess S&P 500 index data?

Question

scaler = MinMaxScaler()
sp500_scaled = pd.Series(scaler.fit_transform(sp500).squeeze(), 
                         index=sp500.index)
sp500_scaled.describe()

This code is from the book Machine learning for algorithmic trading.

https://github.com/stefan-jansen/machine-learning-for-trading/blob/main/19_recurrent_neural_nets/01_univariate_time_series_regression.ipynb

It uses a two-layer RNN to predict S&P 500 index, which is a time series of stock market index. The overall task is a time-series prediction task. It uses data of 63 time steps to predict the next time step data.

It uses MinMaxScaler to transform all S&P 500 data to [0,1] before feeding it into the RNN model. Before scaling the data, it is.

DATE    SP500
2012-01-04  1277.30
2012-01-05  1281.06
2012-01-06  1277.81
2012-01-09  1280.70
2012-01-10  1292.08
... ...
2019-12-24  3223.38
2019-12-26  3239.91
2019-12-27  3240.02
2019-12-30  3221.29
2019-12-31  3230.78
2011 rows × 1 columns

After scaling it with MinMaxScaler, it is

DATE
2012-01-04    0.000000
2012-01-05    0.001916
2012-01-06    0.000260
2012-01-09    0.001732
2012-01-10    0.007530
                ...   
2019-12-24    0.991522
2019-12-26    0.999944
2019-12-27    1.000000
2019-12-30    0.990457
2019-12-31    0.995292
Length: 2011, dtype: float64

I don't know why it is necessary to use MinMaxScaler to preprocess the time-series data. And of course after training the model, the prediction results are scaled back to original size to compute the real prediction error.

Scalers are used to remove discrepancy on different units in the features. Also, stackoverflow might be a better venue to ask for software debugging. — msuzen, Jan 05 '22 at 00:08
Scaling will also allow to not bias dimension reduction technique like pca — Mayeul sgc, Jan 05 '22 at 02:19

Why does this code use MinMaxScaler to preprocess S&P 500 index data?

0 Answers0