Is Multiple Linear Regression in 3 dimensions a plane of best fit or a line of best fit?

Question

Our prof is not getting into the math or even geometric representation of multiple linear regression and this has me slightly confused.

On the one hand it's still called multiple linear regression, even in higher dimensions. On the other hand, if we have for example $\hat{Y} = b_0 + b_1 X_1 + b_2 X_2$ and we can plug in any values we'd like for $X_1$ and $X_2$, wouldn't this give us a plane of possible solutions and not a line?

In general, isn't our surface of prediction going to be a $k$ dimensional hyperplane for $k$ independent variables?

score 13 · Accepted Answer · edited Jun 11 '20 at 14:32

You're right, the solution surface is going to be a hyperplane in general. It's just that the word hyperplane is a mouthful, plane is shorter, and line is even shorter. As you continue on in math, the one dimensional case becomes discussed ever more rarely so the tradeoff

Big words for high dimensional, Small words for small dimensional

starts to look, well, backwards.

For example, when I see an equation like $Ax = b$, where $A$ is a matrix and $x, b$ are vectors, I call this a linear equation. In an earlier part of my life, I would call this a system of linear equations, reserving linear equation for the one dimensional case. But then I got to a point where the one dimensional case just did not come up very often, while the multi-dimensional case was everywhere.

This also happens with notation. Ever seen someone write

$$ \frac{\partial f}{\partial x} = 2x $$

That symbol on the left is a name of a function, so to be formal and pedantic, you should write

$$ \frac{\partial f}{\partial x}(x) = 2x $$

It get's worse in multi-dimenstions, when the deriviative takes two arguments, one is where you take the derivative, and the other is in which direction you evaluate the derivative, which looks like

$$ \nabla_{x} f (v)$$

but people get lazy very quickly, and begin to drop one or the other arguments, leaving them understood by context.

Professional mathematicians, tongues firmly in cheek, call this abuse of notation. There are subjects in which it would be essentially impossible to express oneself without abusing notation, my beloved differential geometry being a case in point. The great Nicolas Bourbaki expressed the point very eloquently

As far as possible we have drawn attention in the text to abuses of language, without which any mathematical text runs the risk of pedantry, not to say unreadability.

— Bourbaki (1988)

You even comment on an abuse of notation I fell into above without neven noticing it myself!

Technically since you wrote df/dx as a partial derivative, even though the other implied variables would be held as constant, wouldn't the partial derivative technically still be a function of all the variables of the original function, as in df/dx (x, y, ...)?

You're perfectly correct, and this gives a good (unintentional) illustration of what I'm getting at here.

I encounter the derivative in a true one-variable sense so rarely in my day-to-day work and studies, that I've essentially forgotten that $\frac{d f}{d x}$ is the correct notation here. I intended the above to be about a one variable function, but unconsciously signaled otherwise by my use of $\partial$.

Guess i think of it as when we say "infinite sum" instead of "the limit of a sum as the number of terms approaches infinity". The way I think about it is that it's fine as long as the conceptual difference is clear. In this case (multiple regression), I wasn't really sure what we were talking about in the first place.

Yah, that's a consistent way to think about it. The only real difference is that there we have such a common situation that we invented additional(*) notation and terminology ($\Sigma$ and "infinite sum") to express it. In other cases we generalize a concept, and then that generalized concept becomes so ubiquitous that we reuse old notation or terminology for the generalized concept.

As lazy people we want to economize words in the common cases.

(*) Historically, this is not how infinite sums developed. The limit of partial sums definition was developed a posteriori when mathematicians started encountering situations where it was necessary to reason very precisely.

It's funny that you give the example of partial derivatives because I used to always wonder about that (the joys of self-studying...). By the way (unrelated and not me being pedantic but just wanting to make sure I understand as much as possible) technically since you wrote df/dx as a partial derivative, even though the other implied variables would be held as constant, wouldn't the partial derivative technically still be a function of all the variables of the original function, as in df/dx (x, y, ...)? I guess my question is isn't the partial derivative still a function of all the variables? — jeremy radcliff, Jul 28 '16 at 20:04
Also, thanks for explaining all of that. I guess i think of it as when we say "infinite sum" instead of "the limit of a sum as the number of terms approaches infinity". The way I think about it is that it's fine as long as the conceptual difference is clear. In this case (multiple regression), I wasn't really sure what we were talking about in the first place. I tried to imagine a line in 3d and then realized it didn't make sense if we let several independent variables vary freely, so I just wanted to make sure. — jeremy radcliff, Jul 28 '16 at 20:06
+1 great answer. Sometimes people are lazy and will cause a lot of confusions. That is why I was trying to ask notations in this post. http://stats.stackexchange.com/questions/216286/what-are-the-classical-notations-in-statistics-linear-algebra-and-machine-learn — Haitao Du, Jul 28 '16 at 20:07
@MatthewDrury, thank you for taking the time to address my comments. It's very helpful to me because I self-study the vast majority of the math I know, and the lack of surrounding culture and access to mathematicians make places like stackexchange and answers like yours invaluable to me. — jeremy radcliff, Jul 29 '16 at 02:44

Glen_b · Answer 2 · 2016-07-29T01:43:18.230

"Linear" doesn't quite mean what you think it does in this context - it's a bit more general

Firstly, it's not really a reference to linearity in the x's but to the parameters* ("linear in the parameters").

Secondly a linear function in the linear algebra sense is essentially a linear map; $E(Y|X) = X\beta$ is a linear function in $\beta$-space.

So a plane (or more generally hyperplane) of best fit is still "linear regression".

* though it will be linear in the supplied x's if you consider the constant column of $1$'s as part of the coordinate-vectore (or alternatively think of it in homogenous coordinates with normalization of the additional coordinate). Or you could just say $X\beta$ is linear in both $X$ and $\beta$

Is Multiple Linear Regression in 3 dimensions a plane of best fit or a line of best fit?

2 Answers2