Why is 50% the best breakdown point for an estimator?

Question

As stated in Wikipedia:

Intuitively, we can understand that a breakdown point cannot exceed 50% because if more than half of the observations are contaminated, it is not possible to distinguish between the underlying distribution and the contaminating distribution Rousseeuw & Leroy (1986). Therefore, the maximum breakdown point is 0.5

What is the intuition behind this?

When you suppose data can come from two or more sources, at most one source can contribute more than 50% of the data. *That's all this is saying.* The illustrations in my (closely) related post at https://stats.stackexchange.com/a/114363/919 might help with the intuition. — whuber, Jan 14 '22 at 17:43

score 2 · Answer 1 · answered Jan 14 '22 at 11:05

2

First realise that there is a mathematical theorem behind this, which has assumptions. The statement isn't true in general. A standard assumption is affine equivariance, which roughly means that this only holds if estimators "move" in a certain sense with the data. For example, if you compute the sample mean and then add 5 to all observations, the mean moves by 5 as well. Particularly, the estimator needs to be able to move to infinity if the data are changed more and more. Technically, estimating the mean as 0 independently of the data defines an "estimator" as well (a very crappy one!), and this estimator has a breakdown point of 100% - regardless of what the data are and how much you change them, it will always be the same!

Now imagine you have a (reasonably flexible, see above) estimator $T$ that has a breakdown point of 60%. So you have a data set, say $x=(x_1,\ldots,x_{100})$. Having a breakdown point of 60% means that you can keep observations $x_1,\ldots,x_{41}$ and remove the other 59% of the observations by something else, and the estimator will still stay in a neighborhood of the original value $T(x)$.

Now imagine a sequence of other data sets $y^k=(y_1^k,\ldots,y^k_{100})$ for $k\to\infty$ so that $T(y^k)\to\infty$ (which is possible because of the affine equivariance assumption, see above), i.e., $T(y^k)$ can be arbitrarily far away from $T(x)$. If the estimator has a breakdown point of 60%, you can change 59% of the observations of $y^k$ and the resulting estimator will still be arbitrarily far away from $T(x)$.

But this isn't possible, because when replacing 59% observations of $y^k$, you may well introduce $x_1,\ldots,x_{41}$ to the data set, and then the estimator needs to be close to $T(x)$ as explained in the paragraph before. So there is one >40% portion of the data that requires that the estimator should be in one place, and another >40% portion that requires the estimator to be in a totally different place. This cannot be true.

This can only be avoided by having a breakdown point <50%.

answered Jan 14 '22 at 11:05

Christian Hennig

10,796
8
35

The claim of "$<50\%$" seems to not be true, at least depending on the definition of breakdown point. The median has a "finite sample breakdown point" of half, _rounded up_, according to [Davies, Gather, THE BREAKDOWN POINT — EXAMPLES AND COUNTEREXAMPLES, 2007, REVSTAT] – user551504 Jan 14 '22 at 15:01
@user551504 There are various definitions of breakdown point around. In particular you can define the breakdown point as the largest proportion at which breakdown does not occur (in which case it's smaller than 50%) and as the smallest proportion at which breakdown occurs (in which case it can be 50% exactly but not larger). The implications are the same. There's even more, for example there are "replacement" and "addition" breakdown points, and breakdown points for distributions rather than data sets. I thought I keep things simple... – Christian Hennig Jan 14 '22 at 16:41
1

Sure there's a theorem here--but it's trivial, as I explained in a comment to the question. We have to understand the context of the Wikipedia quotation, which is based on a model in which data coming from one process are "contaminated" by data from another process. Thus, the situation implicitly concerns a *mixture.* Discussion of various definitions of breakdown, of equivariance, *etc.* seem beside the point. – whuber Jan 14 '22 at 17:46
@whuber If the theorem were trivial, it should be clear for what kind of estimators it holds and for what kind it doesn't, but that isn't so trivial after all. Note that there are reasonable estimators that are not affine equivariant and can reach a higher breakdown point, which depends on the sample, the specific estimation problem, and the precise breakdown concept in use. – Christian Hennig Jan 14 '22 at 18:10
1

This has nothing to do with the estimators and everything to do with the model being discussed in the quotation. The quotation holds for *all* estimators, without exception, under this mixture assumption, precisely because it relies on such a simple mathematical fact. – whuber Jan 14 '22 at 19:27
Hi @whuber, I'm having trouble understanding your claim that the estimator doesn't matter. Take the estimator $T(X_1, \dots, X_n)$ which equals the $X_i$ closest to zero, i.e. $|T(X_1,\dots,X_n)| = |X_i| \leq |X_j|$ for all $j$. Would you claim the breakdown point of this estimator is anything other than $1$? – user551504 Jan 17 '22 at 21:33
Or, for any estimator $T$ that you think of: would the breakdown point of the estimator $T^*(X_1, \dots, X_n) = \max\{ −n, \min\{ n, T(X_1, \dots, X_n) \} \}$ be less than or equal to $1/2$? The key point is that not all estimators diverge if a majority of the sample is carefully manipulated. – user551504 Jan 17 '22 at 21:44
@ChristianHennig do whuber's comments make sense to you? the estimator seems to clearly matter, and equivariance is a way to restrict to a class of estimators – user551504 Jan 20 '22 at 16:46
@user551504 I stand by what I had written earlier. Intuitively whuber has a point, but as this does hold for all estimators, it matters for sure what estimators we are considering. – Christian Hennig Jan 20 '22 at 18:06

Why is 50% the best breakdown point for an estimator?

1 Answers1