Determine separation between two modes from distribution

Question

I’ve got a sample of pairwise distances between points in a 2D picture. Some of these points lie within the same object. Their distance to each other is thus smaller than some well-defined threshold (the object’s diameter). Points that lie in different objects (predominantly) have a pairwise distance greater than said threshold. Points that lie within the same object are however rare (<10%).

I would like to determine this distance threshold empirically from my sample.

For “appropriate” parameters (well, herein lies the rub, doesn’t it?) the threshold is visible in the density plot:

enter image description here

The threshold is marked by the arrow. This is the objectively right cut-off for my application: it is the dip after the first tall plateau which corresponds to the distribution of the few points lying within the same object, and it corresponds to the object diameter that can be individually verified in the original picture, but not easily automatically deduced from my data.

Unfortunately, I have no idea how to determine it in an automated fashion. Even the adjust argument / bandwidth for the density function has been found by trial and error, and a different input data set I’ve tried requires a different bandwidth.

Is there any hope? Or should I just give up?

As a point of terminology, I wouldn't use the term "cut-off" for an interior point. As a point of statistical interpretation, your arrow points to a minor antimode, but just from (a) general data analysis experience (b) complete ignorance of your application, I can't see any reason to take it more seriously than any other such detail. I would want to know much more about the raw data (e.g. any granularity in recording), to see results of varying bandwidth and kernel in density estimation, and to see such a feature being repeatedly reproducible in different datasets. — Nick Cox, Nov 07 '13 at 00:24
@Nick Right, “cut-off” is a complete misnomer that slipped in because of the application I need it for. Regarding your (b), I’ll amend the question. In a nutshell, I expect a distribution with two peaks: one of points which lie closer to each other than my cut-off, and one distribution of all other pairwise distances. — Konrad Rudolph, Nov 07 '13 at 00:35
@Nick I’ve essentially rewritten most of the question – sorry for that; however, I’ve realised that the first description was complete crap. I hope this one’s better. — Konrad Rudolph, Nov 07 '13 at 00:46
Optimistically, it is the first local minimum on the density function, with higher densities on either side. That's computable. But to be convincing, it has to persist over a good range of possible bandwidths. Compare work on mode trees, e.g. http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.30.5736 — Nick Cox, Nov 07 '13 at 00:51

Determine separation between two modes from distribution

0 Answers0