I'm trying to understand what sklearn's LinearRegression (which should be using ordinary least squares) is doing when there are more features than observations.
import numpy as np
from sklearn.linear_model import LinearRegression
X = np.random.normal(size=(10,20))
y = np.random.normal(size=10)
reg = LinearRegression().fit(X, y)
reg.coef_
Result:
array([ 0.08483326, 0.10681214, 0.21719561, 0.09594577, -0.03162432,
-0.12966986, 0.06547396, 0.23470907, 0.03750261, -0.09405698,
-0.05079304, -0.06141368, 0.04811855, 0.19887924, -0.02054755,
0.21558906, 0.06054536, 0.08791492, 0.01750048, -0.03848975])
How were these coefficients generated? My understanding is that there should be no residual degrees of freedom, and using R to perform linear regression results in coefficients with NAs. I'm aware of techniques like penalized regression to handle these cases, but I'm unsure how sklearn's LinearRegression is handling this situation.