0

I am trying to do linear regression with one feature only: predicting height with weights. Gradient descent took too many epochs so I used a min max scaler and it converged to the optimum point pretty quickly.

However, predictions are now too high. What do I need to do to get correct predictions? Here's my code:

def min_max_scaler(arr):
    x = arr.copy()
    minimum = np.min(x,axis=0)
    maximum = np.max(x,axis=0)
    x = (x - minimum) / (maximum - minimum)
    return x
class LinearRegression:
  def __init__(self,theta):
    self.theta = theta

  def predict(self,X):
    return X @ self.theta

  def compute_cost(self,X,y):
    yhat = self.predict(X)
    m = len(y)
    return (1/m) * np.sum((yhat-y)**2)

  def train(self,X,y,alpha,epochs):
    m,n = X.shape
    cost_history = np.zeros(epochs)
    for i in range(0,epochs):
      nabla = np.ones(n)
      for j in range(0,n):
        nabla[j] = (2/m) * np.sum((self.predict(X) - y)@X[:,j])
      self.theta -= alpha * nabla
      cost_history[i]  = self.compute_cost(X,y)
    return cost_history

Roland
  • 706
  • 4
  • 9
  • 1
    Dear @Nabin, welcome to SO. In order to help you better, could you please provide details about predictors being "too high"? Ideally, it would help to provide a small dataset where the problem occurs, in order to illustrate what you mean. – Roland Aug 04 '20 at 08:11
  • The way the `min_max_scaler` function is defined, you're only retrieving the max/min of a particular array, but not saving it for the future. That's not how scaling should be done: you want to store the scaling values used for your training data so you can use the same values for your testing data. See: https://stats.stackexchange.com/questions/174823/how-to-apply-standardization-normalization-to-train-and-testset-if-prediction-i/174865#174865 – Sycorax Aug 04 '20 at 12:53

0 Answers0