Square loss for "big data"

Question

Let’s set up a supervised learning problem with $p$ predictors and $n$ observations. The response variable is univariate. The problem can be regression or classification, though I think a classification problem introduces additional complexity if there are more than two categories.

Euclidean distance between two vectors, which is quite related to square loss where the two vectors are the $n$-dimensional prediction vector and the $n$-dimensional vector of observed responses, has problems when the dimension is high (whatever “high” means).

Why is Euclidean distance not a good metric in high dimensions?

Nonetheless, square loss is popular, even when there are hundreds of observations and therefore dimensions (which should be enough to trigger some of the bizarre behavior of the Euclidean norm).

Yes, if we use square loss for OLS regression under the Gauss-Markov conditions, we get the BLUE. Yes, we can do all sorts of inference on the parameters. However, I am thinking of a pure prediction problem, perhaps a complicated neural network. In that kind of prediction problem, the inference and interpretation are much less important than the predictive accuracy.

So why use square loss when there are many observations?

If your response variable is *univariate*, then you have $n$ points in a single dimension - not in $n$ dimensions. Are you thinking of *multivariate* prediction problems? — Stephan Kolassa, Oct 20 '20 at 14:09
@StephanKolassa I'm considering $y\in\mathbb{R}^n$ and $\hat{y}\in\mathbb{R}^n$. Square loss is then $d_{L2}(y,\hat{y})$. I say this is reasonable, as we often write something like $\hat{\beta} = (X^TX)^{-1}X^Ty$, where we treat $y$ as being in $\mathbb{R}^n$. — Dave, Oct 20 '20 at 14:48
OK. What do you mean then by your second sentence, "The response variable is univariate."? So what you actually have is a multivariate problem, where you are training on $N$ samples, each of which is an $n$-vector, and evaluating on $K$ samples, each of which is again an $n$ vector? (And I'm not even looking at the predictors yet.) Can you give an example of where this would go into high dimensions? Are you predicting entire EEG time courses, for instance? — Stephan Kolassa, Oct 20 '20 at 14:55
@StephanKolassa I mean that the response variable is univariate, and there are $n$ observations of it. We then write $y = (y_1,\dots,y_n)\in\mathbb{R}^n$. — Dave, Oct 20 '20 at 15:01
Well, but then [my initial comment](https://stats.stackexchange.com/questions/492842/square-loss-for-big-data?noredirect=1#comment911101_492842) applies, and you have only a single dimension. I think I'm confused. Can you perhaps give a concrete example of what you are trying to do? — Stephan Kolassa, Oct 20 '20 at 15:06
@StephanKolassa I observe ten million lions and ten million tigers, noting their top speeds. I regress speed on species using OLS. I say that $y\in\mathbb{R}^\text{20-million}$, as the parameter vector would be $\hat{\beta} = (X^TX)^{-1}X^Ty$. ($X$ is a column of 20-million $1$s and a column of 20-million species labels, say lions as $0$ and tigers as $1$.) — Dave, Oct 20 '20 at 15:51
OK. So you have a single dimension and a lot of observations in this single dimension. The Euclidean metric has no problems in such a setting that I am aware of. Can you explain what problems you see? — Stephan Kolassa, Oct 20 '20 at 15:58
$y\in \mathbb{R}^\text{20-million}$ is not a single dimension. — Dave, Oct 20 '20 at 16:01
You have 20 million observations in a single dimension, *speed*. A multivariate analysis would predict, e.g., a triple (speed, height, weight), and then you would measure 20 million individuals on these three dimensions. But three dimensions is still not "high-dimensional". Can you explain what problems you see? — Stephan Kolassa, Oct 20 '20 at 16:03
@StephanKolassa The $y$ vector in the $\hat\beta_{ols} = (X^TX)^{-1}X^Ty$ exists in a high-dimension space, even if we just have a single predictor (or no predictors at all in an intercept-only model). — Dave, Dec 16 '21 at 20:21
I still don't see it. $y$ is one-dimensional. How does it "exist in a high dimension space"? I am not talking about the predictors at all, only about $y$. You have many observations in a single dimension. Yes, the Euclidean distance has problems in high dimensional settings, but this is a "big data" setting in a *single* dimension. Can you explain what problem you see with having many observations in a single dimension? Note that four other people apparently were as confused as I was... — Stephan Kolassa, Dec 16 '21 at 20:28

Square loss for "big data"

0 Answers0

Linked