What should be called a linear regression?

Question

I've always been puzzled by the discrepancy between several possible terminological uses for such a basic thing as "linear regression":

A certain number of sources just say it corresponds to the computation of the best linear (or more accurately affine) map between known inputs $x$ in $X = \mathbb{R}^n$ to known outputs $y$ in $Y = \mathbb{R}$.
Others consider almost no hypothesis on $X$ itself (it may not even have a "dimension"). They just suppose we are given $(\phi_i)_{i \in I}$ a family of independent scalar fields on $X$ (generally in some regular space like $C^k(X)$ or $L^2(X)$). Linearity only arise because one is looking for the best scalar field in the vector space $\textrm{span}((\phi_i)_{i \in I})$.

One can see that the second definition easily encompasses the first one by choosing $\phi_i(x) = x_i$ for $i \in 1..n$ and $\phi_0(x) = 1$ so this generic definition has always seemed far more natural for me. But I read so often that linear regression can only create linear boundaries on $X$ (which is obviously false if you take the second definition) that I am not sure it is sufficiently widespread to be considered as canonical.

NB: the same issue applies for linear classification as it is just an assignement of a category by comparing a linear regression output with a threshold.

Related questions: [Definition and delimitation of regression model](http://stats.stackexchange.com/questions/173660/definition-and-delimitation-of-regression-model), [Regression definition](http://stats.stackexchange.com/questions/233013/regression-definition?rq=1). — Richard Hardy, Jan 02 '17 at 14:06
@Richard Hardy thanks but those related questions do not deal with what I am interested in: linearity and what exactly is expected to be linear — Burakumin, Jan 09 '17 at 11:19

score 1 · Answer 1 · answered Jan 18 '19 at 12:46

A linear model linear regression is mostly defined to mean linear in the unknown parameters (but that definition only includes some of the parameters, not variance parameters!) This is more clear in matrix form: $$ Y=X \beta + \epsilon $$ Here the data (variables are columns in the design matrix $X$) are just assumed to be given, there might well be nonlinear relations between the variables, like in polynomial (or spline) regression.

What should be called a linear regression?

1 Answers1

Linked