alternating negative and positive value of slope and y-intercept in gradient descent

Question

I'm working with the following code for gradient descent for simple linear regression:

import numpy as np

def gradient_descent(x,y):
    mCurr = 0
    bCurr = 0
    iterations = 1000
    n = len(x)
    learning_rate = 0.0005

    for i in range(iterations):
        y_predicted = mCurr * x + bCurr
        cost = (1/n) * sum([val**2 for val in (y-y_predicted)])
        md = -(2/n)*sum(x*(y-y_predicted))
        bd = -(2/n)*sum(y-y_predicted)
        mCurr = mCurr - learning_rate * md
        bCurr = bCurr - learning_rate * bd
        print ("m {}, b {}, cost {} iteration {}".format(mCurr,bCurr,cost, i))

x = np.array([20, 43, 63, 26, 53, 31, 58, 46, 58, 70, 46, 53, 60, 20, 63, 43, 26, 19, 31, 23])
y = np.array([120, 128, 141, 126, 134, 128, 136, 132, 140, 144, 128, 136, 146, 124, 143, 130, 124, 121, 126, 123])

gradient_descent(x,y)

I'm facing two kids of errors: 1) The value of slope (m) and y-intercept (b) keep on alternating between negative and positive values:

m -5.0820333396910965e+35, b -1.0389559623628228e+34, cost 4.57760985674227e+74 iteration 997
m 5.5128418892908475e+35, b 1.1270291963079477e+34, cost 5.386601277550346e+74 iteration 998
m -5.980170468178623e+35, b -1.2225684777262613e+34, cost 6.338564061017721e+74 iteration 999

2) when i try to get rid of this error by hyper parameter tuning (i.e. changing the learning rate or the number of iterations), after a certain number of iterations I get NaN values for m and b and infinity value for cost.

Does anyone know what I can do to fix this? I've already tried changing the learning rate and the number of iterations.

Sycorax · Accepted Answer · 2019-10-23T03:27:56.743

Gradient descent often suffers from poor conditioning. We can precondition the data by centering it at the mean and scaling it by the standard deviation. This is why it's common to center and scale variables when using gradient-based optimization as in neural networks.

$$ \begin{align} y &= mx + b \\ \frac{y - \mu_y}{\sigma_y} &= \tilde{m} \frac{x - \mu_x}{\sigma_x}+\tilde{b} \\ y &= \frac{\sigma_y}{\sigma_x}\left( \tilde{m}x - \tilde{m}\mu_x \right)+\sigma_y\tilde{b} + \mu_y \\ y &= \frac{\sigma_y}{\sigma_x} \tilde{m}x - \frac{\sigma_y}{\sigma_x} \tilde{m}\mu_x +\sigma_y\tilde{b} + \mu_y \end{align} $$

So by inspection, we know that $m = \frac{\sigma_y}{\sigma_x} \tilde{m}$ and $b = - \frac{\sigma_y}{\sigma_x} \tilde{m}\mu_x +\sigma_y\tilde{b} + \mu_y$ which we obtain using the centered and scaled data.

Using this converges to a result which agrees with R's lm printout to more than four decimals.

def gradient_descent(x, y):
  m = 0.0
  b = 0.0
  iterations = 10000
  learning_rate = 1e-4

  x_mean = np.mean(x)
  x_std = np.std(x, ddof=1)

  y_mean = np.mean(y)
  y_std = np.std(y, ddof=1)

  x = (x - x_mean) / x_std
  y = (y - y_mean) / y_std

  for i in range(iterations):
    y_hat = m * x + b
    cost = np.mean(np.square(y - y_hat))
    md = -2.0 * np.mean(x * (y - y_hat))
    bd = -2.0 * np.mean(y - y_hat)
    m -= learning_rate * md
    b -= learning_rate * bd

  m_true = y_std / x_std * m
  b_true = -m_true * x_mean + y_std * b + y_mean
  print("m {}, b {}, cost {} iteration {}".format(m_true, b_true, cost, i))


if __name__ == "__main__":
  x_ = np.array([20, 43, 63, 26, 53, 31, 58, 46, 58, 70, 46, 53, 60, 20, 63, 43, 26, 19, 31, 23], dtype=np.float64)
  y_ = np.array([120, 128, 141, 126, 134, 128, 136, 132, 140, 144, 128, 136, 146, 124, 143, 130, 124, 121, 126, 123],
                dtype=np.float64)

  gradient_descent(x_, y_)

You already imported numpy so I've leveraged numpy functions.

Thank you! This helped me get the right cost value after the last iteration. — boomselector, Oct 23 '19 at 21:49

alternating negative and positive value of slope and y-intercept in gradient descent

1 Answers1