3

Suppose I have a line on which i have points in non-decreasing order. My intuition tells me that if I want to minimize the squared mean error on some subset of 3 (or other number) points, I would like to take 3 subsequent points, or rather check all 3-subsets of subsequent numbers on the line to find one with lowest square mean error. Suppose I have 4 points, and subset of points a, b, d, with c point in between points b and d. All points are in increasing order: a__b__c___d.

Now, I have square mean error for a, b, and d: f(a, b, d) = (a-m)^2 + (b-m)^2 + (d-m)^2, where m is arithmetic mean of the subset. My intuition tells me, that if I now choose subset a, b, and c, then f(a, b, c) < f(a, b, d).

HOW can I prove THIS? I tried to do it myself using geometry and using a lot of arithmetic but it is still elusive. Cannot also formulate the question on google to look for answer.

whuber
  • 281,159
  • 54
  • 637
  • 1,101
SWIM S.
  • 1,006
  • 9
  • 17

1 Answers1

1

A "mean square error" is a multiple of a variance; the multiplier depends only on the count of data. Thus it appears you want to prove this lemma:

When $x_n \lt x_{n+1}$ are the two largest values in a dataset of $n+1$ values $(x_1,x_2,\ldots, x_n, x_{n+1}),$ the variance of $(x_1,x_2,\ldots, x_{n-1}, x_{n+1})$ exceeds the variance of $(x_1,x_2,\ldots,x_{n-1},x_n).$

The variance is proportional to the sum of squares of all possible differences of the data. (Again, the constant of proportionality depends only on the count $n.$) When you replace $x_n$ by $x_{n+1},$ you increase all the differences between the highest value and the other $n-1$ values, eaving the count the same and without changing any of the other differences. Therefore the variance increases, QED.


Comments

Because there are always many ways to demonstrate algebraic relations like these, there are other solutions to this one. https://stats.stackexchange.com/a/361801/919 shows another way: apply it to remove $x_n$ and then to adjoin $x_{n+1}$ to the dataset.

A purely algebraic demonstration of the key assertion -- the variance can be expressed as a sum of squared differences -- appears at https://stats.stackexchange.com/a/225758/919.

If you prefer a Calculus approach, you can also define a function $f(x)$ to be the variance of the dataset $(x_1,x_2,\ldots, x_{n-1},x)$ for all $x$ exceeding the largest of the $x_i.$ This is a quadratic function of $x,$ so compute the derivative $f^\prime(x)$ and show it is always positive.

whuber
  • 281,159
  • 54
  • 637
  • 1,101