0

I got confused when I used 10 degrees and got 11 outputs. I checked the https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html and there seems to be (1) column added to PolynomialFeatures.

For instance, in the example that is provided on that page:

import numpy as np 
from sklearn.preprocessing import PolynomialFeatures 
X = np.arange(6).reshape(3, 2) 
X
array([[0, 1],
   [2, 3],
   [4, 5]])
poly = PolynomialFeatures(2) 
poly.fit_transform(X)

Returns

array([[ 1.,  0.,  1.,  0.,  0.,  1.],
   [ 1.,  2.,  3.,  4.,  6.,  9.],
   [ 1.,  4.,  5., 16., 20., 25.]])

What is the purpose of "1." elements at the beginning of each list?

folderj
  • 105
  • 5

1 Answers1

1

The first term is one of the input column to the power of zero. It works as an intercept in a regression model.

Check this question and answers for details: When is it ok to remove the intercept in a linear regression model?

Ale
  • 1,570
  • 2
  • 10
  • 19
  • I see, then any regression further has to have fit_intercept=False enabled. If set to True, it returns 0s for intercept. – folderj Jan 10 '21 at 19:49
  • Did you already include a column of ones? check also this Q/A: https://stackoverflow.com/questions/46779605/in-the-linearregression-method-in-sklearn-what-exactly-is-the-fit-intercept-par – Ale Jan 10 '21 at 20:13
  • Yes, after I run PolynomialFeatures(2).fit_transform(x_train) on my train set, and then use Lasso with fit_intercept=False or fit_intercept=True, I get 3 outputs, I get zeros as first element of three in the latter case (set to True):[1.71 1.81 0.61] and [0. 1.91 0.68]. I'm bootstrapping, so I'm getting many results, but 0s are consistent. – folderj Jan 10 '21 at 20:23
  • This is different from the answer in your link. So it's not clear what happens when it's set to True, but I guess it never should be set to True. – folderj Jan 10 '21 at 20:30
  • Sorry but I don't know the details behind your comment so I cannot help you on that. It would need another question. There is no need to include another column of ones if you use PolynomialFeatures(), since it already does it. – Ale Jan 10 '21 at 21:01
  • Alternatively, set `include_bias=False` in `PolynomialFeatures`. This might be preferable, especially if you want to use a regularized model downstream, so that you don't penalize the intercept term. – Ben Reiniger Jan 11 '21 at 01:57