4

I'm fitting some data using Gaussian Process (GP) in Scikit-Learn. As I understand, the GP requires to scale both X (input features) and Y (outputs) to standard normal distribution (mean = 0 and std = 1).

I usually use the following code to scale my data:

from sklearn.preprocessing import StandardScaler
x_scaler = StandardScaler()
y_scaler = StandardScaler()
X_train_scale = x_scaler.fit_transform(X_train)
Y_train_scale = y_scaler.fit_transform(Y_train)

After fitting and prediction, I will use the following code to re-scale data to original scale that human can understand:

gp = GaussianProcessRegressor() # kernel was defined specific for each task
gp.fit(X_train_scale, Y_train_scale)
X_test_scale = x_scaler.transform(X_train)
Y_test, std = gp.predict(X_test_scale, return_std=True)

I think we can re-scale the Y values using the following line:

Y_test_real = y_scaler.inverse_transform(Y_test)

But I don't know what is the right way to re-scale std. And my question is how to re-scale the std if we scaling Y to normal distribution at the beginning? Actually this value is very important to me because this is the confidence interval. Now, I am using the the following line:

std_real = y_scaler.inverse_transform(std)

But the std_real calculated by the line above is very large and not reasonable. I'm not sure whether we could just scale Y to [0, 1], rather than scale to normal distribution. Or, when Y is small, such as <10, can we not using scaling on Y?

Thanks!

Rocco
  • 271
  • 2
  • 7

1 Answers1

3

the inverse transform of the standard deviation is wrong. Mean and standard variation have to be transformed differently. Here's a brief explanation:

Transform

Let's assume random variable $Y$ with mean $\mu_Y$ and variance $\sigma^2_Y.$ The "Scaler" subtracts some constant $a$ and divides the result by a factor $b.$ The transformed variable equals $Z = \frac{Y-a}{b}.$ It has mean \begin{equation*} \mu_Z = \mathrm{E}\{Z\} = \frac{\mu_Y-a}{b}, \end{equation*} and variance \begin{equation*}\begin{split} \sigma^2_Z &= \mathrm{E}\{(Z-\mu_Z)^2\}=\mathrm{E}\{Z^2-2Z\mu_Z+\mu^2_Z\}=\mathrm{E}\{Z^2\}-\mu_Z^2\\ &=\frac{1}{b^2}\left(\mathrm{E}\{Y^2-2aY+a^2\} - \left(\mu_Y^2-2a\mu_Y+a^2\right)\right)=\\ &=\frac{1}{b^2}\left(\mathrm{E}\{Y^2\} - \mu_Y^2\right) = \frac{\sigma^2_Y}{b^2}. \end{split}\end{equation*}

Inverse Transform

Now, if we want to do the inverse transform, for the mean, we have \begin{equation*} \mu_Y = b\cdot\mu_Z+a. \end{equation*} However, the standard deviation is independent of $a$, we have only the scale factor: \begin{equation*} \sigma_Y = b\cdot\sigma_Z. \end{equation*}

Custom Scaler Implementation

class MyScaler:
    def __init__(self):
        pass

    def fit(self, x):
        self.mu  = np.mean(x)
        self.std = np.std(x)

    def transform(self, x):
        return (x-self.mu)/self.std

    def inverse_transform_mean(self, x):
        '''To transform mean'''
        return x*self.std + self.mu

    def inverse_transform_std(self, x):
        '''To transform standard deviation'''
        return x*self.std

Scaler Usage

y_scaler = MyScaler()
y_scaler.fit(Y_train)
Y_train_scale = y_scaler.transform(Y_train)

# Do your GPR fittting stuff here...
# Returns posterior mean Y_test
# and posterior standard deviation std

Y_test_real = y_scaler.inverse_transform_mean(Y_test)
std_real    = y_scaler.inverse_transform_std(std)