Can Simple Multiple Regression be applied when you have a Training Set where the number of features is greater than the number of examples?

Question

Suppose we have a Training Set $X$ of size $n\times d$, where $n$ represents the number of examples and $d$ represents the number of features.

Assume that $d>n$, so the number of features is greater than the number of examples/observations.

In this case a simple multiple regression model CANNOT be learned right? I think so because when you have to calculate the system:

$$X^TXw=X^Ty$$

$w$ is vector where element are the parameters of regression model
$y$ is vector where element are numerical class into Training Set

because $rank(X)=n$ (infact, the examples into training set are indipendet), then $rank(X^TX)=n$, and because $n<d$, rank of $X^TX$ is not full and $X^TX$ is not invertible.

Is this reasoning correct?

yes thank you too! – Francesco Ladogana Jun 14 '21 at 16:01 — Francesco Ladogana, Jun 14 '21 at 16:01

score 1 · Accepted Answer · answered Jun 14 '21 at 15:41

1

Yes, the reasoning is correct. In case of $d > n$, the system of normal equations $X^\top X w = X^\top y$ does not have a unique solution.

answered Jun 14 '21 at 15:41

Misius

703
12

perfect thank you very much! – Francesco Ladogana Jun 14 '21 at 16:01

Can Simple Multiple Regression be applied when you have a Training Set where the number of features is greater than the number of examples?

1 Answers1